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FOREWORD 



Scientific and technological progress are today the major underlying forces of economic and 
social growth. They afford great stimulus to viable economies, improved standards of living, 
adequate health care, effective transportation, and an international communication network. 
Continuing progress requires a continuing increase of our understanding of the physical world that 
surrounds us, and of the laws of nature which govern it. Such understanding is me goal of scientific 
research, valid for both the physical world and the social world. The companion effort, the 
application of understanding in the solution of practical problems, constitutes technology. 

One key to the effectiveness of science and technology is their ability to apply yesterday's 
discoveries to today’s problems. The warehouse of knowledge already gained - the scientific and 
technical literature - is a major national and global asset. But using that asset requires locating the 
relevant information packages in the warehouse and matching them to the problem at hand. Since the 
warehouse acquires more than two million pages of new information each year, and since the content 
is not always fully identified, the user frequently requires help. Moreover, information quality is not 
uniform, and the wise user seeks assurance of the accuracy of the information he uses. 

Meeting these user needs-accessibility of relevant information, and assurance of its quality, is 
the reason for existence of the Information Analysis Center. 

The Committee on Scientific and Technical Information (COSATI) and its parent 
organization, the Federal Council for Science and Technology, have been keenly aware of the vital 
role of Information Analysis Centers, In 1967 COSATI established a Panel on Information Analysis 
Centers to provide government-wide focus on their functions and problems. 



In the first year of its existence, this Panel sponsored a Forum to help information analysis 
center representatives to exchange ideas with one another, and with their sponsors and other 
government program officials. The Proceedings of that Forum, held November 7-8, 1967, have been 
published and are available from the National Technical Information Service as document number 
PB 177051. 

Continuing explorations during the following four years sharpened the Panel's awareness of 
the technical problems and opportunities faced by Information Analysis Centers. We also recognized 
that government-wide requirements for cost recovery in information programs were becoming a 
major concern for Information Analysis Centers. After receiving a strong favorable recommendation 
from a canvass of a number of Information Analysis Centers, Panel 6 decided to sponsor a second 
Forum in the spring of 1971. The theme selected was the management of information analysis cen- 
ters, with particular attention to key problems identified by center managers. 

The Forum was held on May 17, 18, and 19, 1971, at the National Bureau of Standards in 
Gaithersburg, Maryland. The first session, an overview, featured a keynote address by Dr. Lewis M. 
Branscomb, Director of NBS. Also included were introductory comments on the three major problem 
areas which composed the subsequent three sessions of the Forum, Dr. Ruth M. Davis spoke on 
Information Analysis Centers and automatic data processing. Dr. Byron Riegel commented on 




Information Analysis Centers and abstracting and indexing services. Dr. H. W. Koch surveyed the- 
problems of marketing in relation to Information Analysis Centers, 

The afternoon session on May 17 provided a detailed examination of automatic data 
processing operations and applications. Following dinner that evening, the Honorable James H. 
Wakelin, Jr,, Assistant Secretary of Commerce for Science and Technology, addressed the group on 
the national stake in better technical information. The next morning and afternoon sessions resumed 
close scrutiny of important problem areas - the use of abstracting and indexing services in 
information analysis centers, and marketing. The final session, on the morning of May 19th, was 
given over to tours of information analysis centers in the Washington area, demonstrations of 
computerized operations, and meetings of several common-interest groups under sponsorship of 
individual Federal agencies, 

The members of the COSATI Panel on Information Analysis Centers consider that this 
Forum provided important understanding to center managers about future developments in 
acquiring, handling, and disseminating technical information relevant to their missions. The 1971 
Forum may also be regarded as a fulfillment of promises, made at the 1967 Forum, for closer 
attention to problems which the earlier meeting defined, but did not study in detail. 

The Proceedings which follow record the discussions that took place on May 17 and 18, 
1971. They contain much practical advice beneficial to all managers of technical information 
activities, whether or not they attended the 1971 Forum, 

E. L, Brady, Chairman 
COSATI Panel on Information 
Analysis Centers 






10 



WELCOMING REMARKS 

Col. Andrew A. Aines 
Office of Science & Technology* 



Ladies and Gentlemen; 

Let me open my remarks by stating that I am delighted to see you all here today. It is 
evidence that the conference is deemed to be important by all of you who operate or sponsor 
information analysis centers. It is also good to see so many survivors of a grim period for science and 
technology. Despite our troubles, it betokens, it seems to me, a growing interest and understanding on 
the part of management. If this were not the case, there would be fewer information analysis centers 
in existence. 

Again, I would like to extend to Dr, Branscomb the thanks of COSATI for making available 
this beautiful facility for this meeting. The National Bureau of Standards by any way of reckoning is 
one of the most outstanding laboratories in the world. Its receptivity to, and its participation in, the 
crusade for better dissemination and use of scientific and technical knowledge is second to none. Its 
assistance to the Science Adviser to the President in both his Office of Science and Technology and 
Federal Council Programs has brought me great pleasure and the country even greater profit. 

Since the first Conference on Information Analysis Centers a few short years back, there has 
been considerable progress for lACs. 

This is evident in the growing number of centers, in the increasing interest in IACs in other 
countries, in the introduction of management principles that will, in the long run, strengthen the 
position and the contributions of IACs, and in a relative way, the growing understanding of the 
promise of IACs by scientists, engineers, and managers. 

To a man. Presidential Science Advisers and their staff people have been staunch supporters 
of IACs, but, in recent months, I have been strengthened in my belief that they will play an even 
more vital role in the future. They are a logical intellectual extension or balance to growing 
mechanization of information and data systems. They will help us in the task of information and data 
utilization as well as screening and compacting the literature, 1 used to think of them earlier as being 
most useful at the cutting edge of science, but I have become convinced that they can make great 
contributions in the solution of the complex problems of society, certainly to aid decision-makers and 
problem-solvers, as well as scientists and engineers. 

It is my hope that you have assembled - you leaders in the I AC community - with 
well-defined, hard-boiled objectives; that you will make it a serious meeting of serious people with 
serious problems. I hope that you will ask such questions as: Where are we now? Where do we want 
to go? What obstacles and problems do we have to overcome? What actions do we need to take as 
individuals? What progress can we make via group action? I am not suggesting that you wear hair 



*Now located at the Office of Science Information Service, National Science Foundation, Washington, D. C, 
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shirts and engage in seif-flagellation while you are here, I hope that this will be a meeting to enjoy 
because all of the vibrations will be favorable to new insights and progress. 

So I will conclude my brief welcoming remarks with the friendly challenge to make this 
Conference a great success, a high watermark of accomplishment. I urge you to achieve new insights 
and understanding so that when you leave on Wednesday, you will carry away with you the feeling of 
certainty that you have made a seven-league leap forward, and that you return to your centers 
refreshed and ready to make successful inroads on your problems. Good luck. Have a wonderful 
conference, ladies and gentlemen. 



KEYNOTE ADDRESS 



Lewis M. Branscomb 
Director, 

National Bureau of Standards 



INFORMATION ANALYSIS CENTERS: THE CHALLENGE OF BEING NEEDED 

“The wise man does not act without attempting to know the consequences of his actions. 
Contemporary societies must be more prudent in their actions if technology is to be a boon rather 
than a curse for mankind. Information is the key to the wise management of our future,” 

“Perhaps the most important event of the next decade will be the recognition of the true value 
of information — the right information, reliable and relevant to our needs* available in useful form to 
all those who need it.” 

So begins a report to the Secretary General of the OECD entitled “Information for a 
Changing Society: Some Policy Considerations, 1 ” No better case need be made for the importance 
and future potential of information analysis centers. 

I will not attempt to define what I mean by an information analysis center. COSATI must 
have an official definition that satisfies its needs. Let me say only that such a center is a contemporary 
institutional mechanism for organizing, evaluating and making available the numerical and 
phenomenological information which results from research and observation and which is needed by 
people other than those who generated it. I will restrict myself to science and technology since 
COSATI is similarly oriented. But neither I, nor COSATI, believe that such a restriction implies that 
scientific and technical information is always fully useful to decision makers unless they also have 
associated economic and social data. Indeed the most useful contribution of the information analysis 
centers in science and technology may well be to demonstrate the importance and practicality of 
achieving objectivity and credibility in the effective utilization of organized information. The 
opportunity for the social sciences to contribute decisively to rational decision making in public 
policy depends critically on their developing similar capabilities. 

The information analysis center serves as the cerebral cortex of the technical nervous system. 
In the human body, a signal from eye or ear calls up a search for stored information, which must be 
selected for its relevance and its reliability. The output signal activates the appropriate muscles and 
produces whatever action is required. Similarly, the information analysis center couples the 
unassimilated knowledge of basic science to our technological muscles. When the brain works wed, 
we take it pretty much for granted. When it works poorly, walking down steps or riding a bicycle can 
be a terrifying experience. In technology it is not quite so obvious that the cortex is too small, is 
deprived of oxygen, and Is coupled to only a small fraction of the brain’s memory cells. Why? 

First, our society is — from a technological point of view — -at a very primitive stage of 
evolution. We are just beginning to crawl up out of the sea, so to speak. We accept that research is a 
high risk proposition and that scientific creativity is a delicate flower. But we fail to realize that this is 
no excuse for failing to organize our science and technology system into a functioning whole. Nor is it 



1 Unpublished at this writing. Publication by OECD anticipated during 1971. 




an excuse for failing to introduce improved quality assessment into science. Most people expect the 
data they use to be uncertain, and so place minimum reliance on it. It would be very interesting to 
know how many small scale pilot plants might have been unnecessary if there had been an adequate 
basis for confidence in the theoretical prediction of the efficiency of a full scale system. 

Unfortunately, this skepticism about the reliability of published data is well justified. If we 
need a reminder, there is the story of the rocket fuel production plant that was shut down - after the 
reported expenditure of 200 million dollars - when it was found that its process was based on an 
erroneous value of the heat of formation of a light metal oxide. 

Second, the provision of stimulation for civilian industrial technology characteristic of the 
science policy of the past decade has depended too much on “trickle down” and “drip off,” The 
Federal Government buys research and development (R&D) to get technology for such government 
operations as defense and space exploration. “Spin off’ benefits to the private sector— or “drip off’ 
as Ralph Lapp calls them — are often incidental to this investment. Agencies like the Commerce 
Department’s National Technical Information Service and the Smithsonian’s Science Information 
Exchange are admirable means for facilitating access to the publications and ongoing projects of this 
federal effort. But the mission-oriented Federal R&D system is not intended to focus primarily on 
provision of useful data to the public. There are, of course, some excellent exceptions - represented 
by the information analysis centers that do exist. In the absence of enough information analysis 
centers, those who need data produced in government projects must search the project literature for 
it. 



“Trickle down” refers to the weakness of the mechanisms for ensuring that the results of our 
$2 bullion national investment in basic research find their way into applicable technological choice. 
The Federal Government pays for most of the research but too often it stops short of accepting 
responsibility for the effective evaluation, seconda^ processing, and dissemination of the results. If 
this responsibility were accepted we would have to put the horse back in front of the cart by focusing 
applied research on information needs, buttressing this investment by appropriate new research to 
ensure availability of information not yet in existence. 

The third reason that information analysis centers have been undervalued is reflected in the 
quotation I read from the OECD report. People are not accustomed to placing a realistic value on 
information. Part of the problem, of course, is that information is used in decisions, be they 
managerial or technological. How do you place a dollar value on a decision? We all know that a bad 
decision can be very expensive - witness the experimental rocket fuel plant mentioned earlier. But 
there is no established economic measure for decision quality. How then can we put a price on 
information? Perhaps this ability will come in time from decision theory, which provides a technique 
for quantifying the value of information. But the more traditional answer is that information can be 
priced as other commodities are priced - in the marketplace. But the ability of information analysis 
centers to command a good but appropriate price for their product is limited by a number of factors: 



• Inadequate economies of scale resulting from reaching too small a fraction of the potential 
market; 

• Traditionalist attitudes of the technical community toward information transfer mecha- 
nisms of new kinds, combined with; 
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©A tenacious and well justified desire of scientists to reserve their dependence on 
infon iation sources to those sources whose continuous availability and quality are 
reasonably well assured; 

# Less than full confidence In the reliability of the information products offered, so that the 
user does not risk defraying a substantial cost for information with an even greater cost 
avoidance achieved by relying on it; 

$ Less than fully effective marketing of information analysis center products, combined with 
the fact that information is more efficiently and effectively wholesaled than retailed. Thus 
economic return depends on intermediate institutions such as libraries, which are not irr a 
position to recover or even measure economic benefits from good information. 

All these handicaps are made greater by the fact that the universities do virtually nothing to 
prepare their students for contemporary innovations in the evaluation and handling of knowledge— a 
rather striking indictment of education since evaluation and transfer of knowledge and experience is 
what education is all about in the first place. In addition, we have not yet had enough experience at 
identifying potential clients for information analysis services - -other than the data generating 
community with which most centers are in close contact. A NASA-sponsored study by Denver 
Research Institute 2 showed that design and production engineers in commercial enterprises of 
moderate technological sophistication get their technical information primarily from commercial 
product sales literature and sales representatives. Government publications ranked near the bottom of 
the list in significance as an information source. Government sponsored science and technology 
information transfer programs have yet to reach the majority of industrial uses outside the R and D 
community. 

Finally, the intellectual challenge of information center activity has not yet been fully 
recognized by the peers of those who engage in it. Nevertheless, the information analysis center is an 
old idea whose time has come. The information analysis center will prove indispensable as a means 
to make scientific knowledge quickly available to policy makers in a useful form* It will thus be a 
major factor for ensuring that technology is wisely used for human benefit. The impact of technology 
on society now moves at so swift a pace that there is no longer enough time— or at least no longer 
enough patience — -for research to be launched from scratch when hard choices have to be made. The 
dilemma facing those who must make policy on the use of washing detergents is an excellent 
example. We simply do not know enough today about alternative chemicals and their environmental 
effects, their efficiency as laundering agents, their economic impact on washing machine design and 
life. All this information needs to have been gathered, evaluated, and made available yesterday. The 
resultant cost of making the wrong decision with inadequate information might well run into the 
hundreds of millions* 

It sometimes happens that there is no choice between time and economic cost, for the time to 
get new information is intrinsically limited. When Apollo 13 failed on its trip to the moon, the cause 
of the explosion had to be diagnosed and corrective action taken before the next mission could be 
launched. In this case, NBS cryogenic engineers were able to help NASA track down the cause. Some 
very accurate, evaluated data on the thermodynamic properties of liquid oxygen - available from the 



^‘Commercial Application of Missile Space Technology'’ by John G. Wells, L. G. Marts, eh ah; Denver Research 
Institute Report No. N-64-24335; 1963, 262 pages. 
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N3RDS center in Boulder - were decisive in choosing correctly between two alternative chains or 
events that might have characterized the failure. 

Indeed, the course of science and technology is strongly influenced by the information base on 
hand at the time of a new conceptual break through, NB3 established its Atomic Energy Levels 
program several decades ago* largely because of the needs of basic research in atomic structure and 
in astronomy. In time, thanks to the great work of Charlotte Moore Sitterly and her colleagues, this 
center achieved a position as authoritative focus for the analysis of atomic spectral data. In the 4 40 5 s 
and ‘50’s there were times when budget makers questioned the steady drain of investment in this 
large effort. But suddenly there was the MASER and the evident possibility of making an optical 
oscillator through coherently stimulated radiation emission: the LASER. In 1966, Professor W, R, 
Bennett of Yale said 3 , “The three volumes of critically-compiled data on Atomic Energy T els 
(NBS Circular No, 467) played an essential role in the development of the gas laser, Without the 
existence of these data the development of the gas laser field ... would have been delayed for many 
years,” What economic benefit shall we ascribe to this application alone? The laser industry is 
growing exponentially, and is now well beyond the SI 00 million level. Even if the Atomic Energy 
Levels program brought the laser only one year sooner, the carrying charge on the fraction of the 
national debt equal to the annual taxes paid by the new laser industry would have paid for operation 
of the atomic energy levels program for a decade. 

Timeliness of data availability is indeed one of the most important values of the information 
analysis center. But in my personal view, quality and reliability enhancement are by all odds the most 
important benefits that flow from well managed information analysis centers, I suspect they also pose 
the most subtle and difficult problems for information analysis center managers. Accordingly, I am a 
little surprised to note that this management problem does not seem to be on the agenda for this 
meeting. 

I will illustrate the role of reliability by discussing numerical data evaluation, but this 
principle can be extended to nonqualitative forms of information. First let me call to your attention a 
small and informal symposium being sponsored in this auditorium on July 21, 1971, by the U.S* 
National Committee for CODATA* The occasion is the CODATA General Assembly in Washington, 
and the topic will be a discussion of criteria for data evaluation. We will have a panel discussion - 
perhaps a debate - involving information analysis center managers and primary journal editors. The 
meeting is open to the public, and I know Dr, J, Ross MacDonald, Chairman of the Numerical Data 
Advisory Board of the National Academy of Sciences would want me to invite each of you to attend 
If you can. 

The manner in which data are evaluated determines the impact of that evaluation on the 
primary scientific work to follow. The traditional, or ex cathedra method consists of asking an expert 
(a species for which there is as yet no objective performance criterion) to select the “good” from the 
“bad,” In the absence of an information analysis center, many scientists do not use the results of 
unknown younger scientists until those results have been used in the work of one of the “great men of 
the field,” The frequency with which original data are referenced by citation of a theoretical paper by 
a famous man who used the datum himself, rather than by citation of the original research paper, is 
an indication of this tendency* 
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But certification by imprimatur is neither objective nor fair; nor does it lend itself to the 
identification of measures of reliability. Unless quantitative statements can be made about precision 
and systematic errors, you cannot expect users to place reliance on the information, for they cannot 
compute the risk they are taking by using it. 

A second alternative to ex cathedra evaluation is the Delphi or consensus method. Those of 
you familiar with technological forecasting will know that the Delphi method is regarded as a great 
improvement on the educated guess as a means for predicting the future of technical events. It 
consists of the projection into the future of the average of a number of educated guesses by people 
whose education differs at least in some respect. But a committee gives no more assurance of an 
objective reliable result than does a high priest of science. Indeed, the pressures of a committee will 
work for inclusion of everyone’s data whether these data are valid or not. 

The preferred approach is to apply the principles of scientific objectivity to the evaluation 
process, and to ensure that the individuals applying it have demonstrated competence sufficient to the 
task. This process calls for criteria by which data are to be evaluated. If the data are derived from 
experiments, these criteria cannot be developed without a complete theory for the experiment, giving 
systematic error phenomena equal weight and attention with the phenomenon under investigation. (It 
is we scientists who chose to assign the phrase “experimental result” to one phenomenon and 
“systematic error” to other phenomena in an experiment. Nature is unaware of our prejudice in this 
regard, and holds all phenomena in equal esteem. She sometimes punishes us for our narrow vision 
by letting us delude ourselves about what the experiment is we have actually performed.) 

Thus the criteria for evaluation of experimental results become an algorithm for doing a valid 
experiment in the first place. Information centers are obligated to publish the criteria they use to 
referee literature. Those that have won the respect of both users and data generators will find that 
prudent future investigators will take more care when rv work is done, not only to do a valid 
experiment but to publish the evidence that their errors were under control. 

The impact of information analysis center ope^ oris fundamental science is well 
documented. For example, the landmark paper issued Lee Kieffer an 4 Gordon Dunn in Reviews 
of Modern Physics 4 discussing the state-of-the-art in reliable reference aula on cross sections for 
ionization in electron collisions has been analyzed. The article in question appeared in 1966, Since 
then, many authors have cited this article In their own papers. Fifty-three of these articles were read 
and analyzed as to what the impact of the citation was. Twelve of the articles made reference to the 
Kieffer -Dunn article for general citation, background purposes, or newstype items. Twenty-two 
articles cited Kieffer and Dunn in making use of, or referring to, the data which Kieffer and Dunn 
presented, for purposes of computation or comparison between experiment and theory, for 
comparison between one experimental value and another, or for calibration purposes. Nineteen 
articles explicity recognized the main conclusion of Kieffer and Dunn, that a very large fraction of 
the previously reported results in the field of electron collision cross sections were deficient in their 
reporting of experimental data and in the analysis of systematic errors. These last nineteen artieles 
made clear identification of their own attempts to avoid such inadequacies and to present their own 
data in a reliable and meaningful form. Thus, it is clear that over 1/3 of a significant group of 
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references to this article showed that research had been influenced by the output of an information 
analysis center, improving the future input. 

Admittedly, the numerically specifiable properties of elementary substances are most suitable 
for this kind of objective evaluation and for the specification of accuracy values. Such data are the 
concern of many of the information analysis centers of the National Standard Reference Data System, 
coordinated at NBS. But It is a generally valid principle that managers of all kinds of centers should 
strive in this same direction. Not only will such procedures have more favorable impact on the 
quality of data sources, but they are also essential to insure a broad and continued acceptability of the 
products of the center. 

What can we say about the role of automatic data processing in the information analysis 
center operation? I would not wish to duplicate any of the discussion already scheduled on this topic. 
Let me only note that if we view the information service business as a service industry and ask what 
we can expect in the way of productivity increases, we see an immediate role for the computer. And 
indeed, the computer is being widely applied to vastly increase the output of product - at constant 
product nature and quality - per unit of input labor. Most centers use computers for internal 
operations - storing, searching, and retrieving bibliographies and often data files as well. Some use 
computerized access to indexing and abstracting services. Some use computers in output or product 
preparation— as in computer type-setting. But only a few, such as MEDLARS, use teleprocessing as 
a means for dissemination and marketing. In my view, the reasons are rooted in simple economics 
and in the handicaps of traditionalism noted above. I am certain that the day is coming when 
real-time access to evaluated data files will be at the fingertips of the individual scientist. Research 
efforts to hasten the day when this will be practical and economic are well justified. But, I also 
believe that the surest way to kill the concept of information analysis centers is to oversell present 
market demand for their products and to force them into symbolic if not actual bankruptcy by tying 
their viability to excessive requirements for capital investment in communications and software at a 
premature stage. 



But the computer can also increase the effective productivity of information analysis centers 
by even more important qualitative changes. Computers permit the application of quantitative 
criteria for data validity of much greater complexity than those normally applied by the subjective 
evaluator, X-Ray crystallographers today already have in operation a system for subjecting the data 
in proposed publications to a computer's evaluation. Thus, journal editors have added another 
referee to their staffs, one of great perseverance and undeniable objectivity - a computer. The 
extension of this principle may bring the day when a quantitative method for data evaluation can 
provide guidance for the synthetic creation of new knowledge. 



Theorists are beginning to develop general purpose programs for computing arbitrary 
numbers of useful quantities in terms of input parameters that can be specified by the users. Rather 
than applying the program to a few published cases and destroying it, the information analysis center 
keeps these programs as powerful tools for producing new data on demand. As the center also comes 
into possession of thousands of experimental data values, it can update the accuracy of much of these 
precise, but less accurate, data when new benchmark measurements of great accuracy are made. By 
combining such normalized data with theoretical programs parametrically dependent on the data. 



thus synthesizing new knowledge, the information analysis center will become a focus for analytical 
research of a new kind. As the error control in the data banks of natural properties improves, the 
information noise level represented by today’s data uncertainty will drop. And with improved 
information signal -to-noise level provided by improving selectivity, phenomenology contained but 
concealed in the data will begin to emerge into view. 

The emerging role of the information analysis center in dealing with basic physical properties 
as a source of new knowledge is perhaps only dimly seen at this time. If we look to the information 
analysis center dealing with information on much more poorly characterized systems - such as data 
on the incidence and circumstances of building fires - we see that the information analysis center 
creates valuable new knowledge today. Nowhere is this better illustrated than in information centers 
designed to identify the needs for government regulation of technology. 

It is relatively easy to prove that there are too many accidental fires, and too many people are 
killed. It is easy to show that thousands are hurt by needlessly dangerous household products. But it 
is often very difficult for the official with responsibility for setting mandatory standards to identify 
the chain of events that leads from exposure to risk to the initiation of a dangerous set of events and 
then to the final tragic injury. Only well organized systems for acquiring data on injury, on the 
frequency of exposure to risk, and on the nature of individual vulnerability, and systems for the 
critical evaluation of such data can provide the degree of confidence needed to be sure the mandatory 
rule will in fact lead to a reduction of injury. It is urgent that this nation insure its ability to regulate 
technology in a rational way that does indeed satisfy public expectations of benefit which are 
expressed in the new authorities being provided by Congress, These new authorities, when exercised, 
drive up costs and limit technological alternatives. If a better environment, safer cars, toys, and 
household products, uncontaminated food and less risk of fire do not result, the present unhappiness 
concerning science and technology could become a rebellion. A large array of increasingly 
sophisticated information evaluation centers are required for such purposes. 

I hope so far I have convinced you of the value, potential, and present need for information 
analysis centers. Perhaps, as managers of information analysis centers, you weren’t very hard to 
convince. Let me now address briefly the question: 

How can you measure the success of an 1 AC? 

I would ask three sets of questions: 

Of users, I would ask; Tell me in what way you rely on the information from this center to a 
greater extent that you would on access to the original literature from which the information came? 
What would you have done instead if the center’s product had not been available to you? What are 
you prepared to do — including paying money — to insure your continued access? 

Of the center itself, I would ask: What evidence can you show me that the information you 
are now receiving is of better quality as a result of your prior work? To what extent have the 
demands of your customers reflected themselves in the priorities of the data generators who feed you 
their material? How much cooperation do you get from the data generators whose scientific 
reputations you hold in trust when you evaluate their data? 




And of the R&D policy makers in government and industry, l would ask: How confident are 
you that the research you are buying reaches those who need it in an optimum fashion for its effective 
utilization? Is your own access to the information you need for decisions being provided for? What 
fraction of the market for evaluated information is being satisfied? 

What does it take for a Center to be successful? I believe six requirements are necessary and 
sufficient: 

1) Competence: Reflected in the continued involvement of the Centers intellectual 
leadership in creative work and thus in the scholarship of the evaluation done. 

2) Continuity: A long-term commitment to generating confidence by the user and the data 
generating communities in the competence of the team. 

3) Completeness: The user must know within the scope of his inquiry that the Center’s 
coverage is complete; otherwise he cannot assess his risk in relying totally on the 
Center’s information product. 

4) Conscience: Realizing the plight of the user from another discipline who cannot evaluate 
the information even if he could find it, and the fate of the data generator whose paper 
will never be read by the scientists of tomorrow, because they will instead rely on and 
refer to the evaluated output of the Center, 

5) Cash: To finance the very demanding and expensive scholarship without which the 
primary values of information analysis centers are lost. This cost cannot be passed on to 
the retail customer. 



How big is the job, and how much cash will it take? Today there are over 500,000 scientists 
and engineers employed in a $27 billion national R&D enterprise. Over 50,000 of these are in basic 
research (3.8 billion). I have no idea how many are engaged in generating quantitative, storable 
information in the public domain, but let’s suppose the number is similar to those supported by the 
total basic research budget, A good review covers careful study of perhaps 50 to 100 papers on an 
average, of which about 10% might have been contributed in the preceding year. It takes at least as 
long to do the research for such an evaluation and review as it does to prepare one of the original 
research investigations (about a year on average). If one updates the criteria for evaluation every 2 to 
5 years, we need from 2 to 5% of the data generating workforce engaged in data analysis evaluation 
and review. This might be somewhere in the ball park of $40 - to $100 million per annum. 

Now that estimate is not a budget justification, because it is an input investment, not a 
measure of product value. But it does let me ask: Should we view information analysis center activity 
as a necessary adjunct to our national long=range research investment? Should we tax basic research 2 
to 5% to provide for evaluation and preparation for use of their data? A good ease can be made for 
this approach. But if the R&D user community were prepared to pay from 0.08% to 0.2% of their 
costs for organized, reliable information the information analysis centers would be well provided for. 
If we can then find ways to overcome the present handicaps faced by the Centers and finance their 
production and marketing operations in a reasonable way, we may so upgrade the regard with which 



good information is held that we generate a thirst for more basic research to feed the information 
system. When the lAC’s thus become the main force for generating public support for increased 
financial support for basic research, the informational evaluators can stop worrying about being 
second class citizens in the scientific community. 



I would like to close my remarks as I opened them, with a quotation from the OECD report 
referred to earlier. While the advice was originally intended for ail OECD member governments, I 
believe it should be taken particularly to heart by our own, I quote from selected paragraphs of the 
report’s conclusions and recommendations. 



'■Current information systems generated by research workers primarily for their own 
requirements are well established but most are quite inadequate for users in otner 
disciplines and in technology, and are increasingly inadequate in their own disciplines, 

“ Recommendation 5 - We recommend that governments give greater support to mechanisms 
for insuring effective interchange of information among scientists, giving explicit recognition to the 
key importance of informal systems, of which International personal contact and oral communication 
are an increasingly vital part. We further recommend that governments devote more effort to 
experiments in improving information transfer between scientists, particularly between scientists of 
different disciplines, and between scientists and non-specialists. Various kinds of information 
analysis, consolidation, evaluation, and repackaging can be envisaged here, and the different kinds of 
specialized information centers and information analysis centers have a vital role to play in 
improving the value - to science and to technology - of the national investments in R&D, These 
activities will improve both the quality and the usefulness of information in the hands of those who 
need it. 



“We recommend that governments at the highest level accord priority attention not only to 
the development of policies for the generation of scientific and technical information, but also to the 
development of policy for the efficient and prudent use of such information in policy formulation, in 
the conduct of the affairs of government, and in R&D management. 

“Proper handling of scientific and technical information must not be regarded as an 
administrative or mechanical matter, to be considered apart from (and often after) the design of R&D 
strategy. Systems for dealing with scientific and technical information have quite different 
requirements for the four spheres of activity in which information is used: for the conduct of science 
itself (for which most current systems are designed), for the effective generation of technology and its 
application in industry, for decision making and policy formulation, and for the enlightenment of the 
general public through education and public information. 

“We recommend that policies and strategies for scientific and technical information should be 
developed as an integral part of the design of policy as a whole and R&D policy in particular, in such 
manner that in each of the above areas of public concern provision is made in advance for the 
scientific and technical information system requirements. Thus, the focus for policy concern in 
scientific and technical information should be closely associated with the focus of responsibility for 
R&D policy itself. M 
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When one is discussing computers and Information Analysis Centers at the same time, one 
feels obliged to start out with the phrase, “In the beginning there was Vannevar Bush; then came 
Alvin Weinberg’s report/’ In that report, in 1963, some pertinent recommendations were made. 
Looking at them today would make everyone believers of the assertion that there is nothing new 
under the sun. 

Dr, Branscomb said that one has the responsibility of deciding how to determine one’s own 
progress, 1 think, in many instances, we are not sure, ourselves, when we have actually achieved a 
result, or anything we could call an innovation. We have not ascertained a means for making those 
kinds of assessments. For example, in 1963 the Weinberg Report said 

. , colleges must educate in the art of handling information more professionals who 
can lighten the burden of the technical man and can invent new techniques of 
information retrieval”. 

In 1971, we have a little over a hundred departments in universities which grant either a 
Bachelor’s, Master’s, or a Ph.D, degree in computer sciences; certainly computers have been 
recognized as a principal means for manipulating information. We have, however, only about 25 
departments of information science in universities; a surprising number of these are actually 
-‘interdisciplinary” programs. A degree in an interdisciplinary program has, at the moment, rather 
uncertain market value after a few years. While “interdisciplinary” has the connotation of 
encompassing more than can any one given department in a university and, therefore, in some way 
not being able to be subsumed under any of them, it also has the connotation of not being adequately 
enough defined to become a department in itself, with a developed curriculum that will be able to 
meet stated objectives in education or in research. 

This same report (Weinberg) in 1963 said 

S4 * . . The technical community must explore and exploit new switching 
methods. . . . The information transfer network is held together by an array of 
switching devices that connect the user with the information (as contrasted with the 
documents) he needs. As the amount of information grows, more ingenuity will be 
needed to find effective switching mechanisms. . . e The technical community must 
courageously explore new modes for information processing and retrieval.” 

Now when we say switching we normally think of that type which involves some sort of 
automation. In this area, when we try to determine progress, we reach this impasse of not knowing 
how to judge progress. Certainly, if we look at connecting the user with information, as opposed to 
connecting him with the documents that he wants, we are faced with the complex problem of 
electronically connecting users, geographically dispersed, with information at locations different from 



ERJC 




their own. The implication in the Weinberg report was that one neither moves the users to the 
information or moves all the information to each individual user. We then realize that we haven’t 
even been able, as yet, to define the reality of this goal. The goal of delivering information, as 
opposed to documents, is essentially the goal for information networks. But the means of meshing 
network components, and making them work, will keep us all busy for some time. 

Let me now describe a goal of remote browsing. This simply means that I can sit in my office 
and with a telefactoring device, meaning any kind of mechanism which extends my capabilities, 1 can 
connect myself to remotely located information media, be they magnetic tape, paper, films, 
photography or books. I can connect myself in such a way as to remotely browse through these 
holdings, make my own selection of information through query, pull books from stacks, look at a set 
of photographs or query a computer from a console, xerox or duplicate at my own convenience the 
information I need, stamp it and put it into a mail chute at that information center and mail it to 
myself. Now, surprisingly enough, this is a capability that has been costed out to a limited extent and 
for which most of the technology exists in scattered segments, it is a capability towards which the 
Weinberg report pointed some eight years ago. 

Another recommendation in the Weinberg report stated 

“Among the schemes that ought to be exploited more fully are: 

a. Specialized Information Centers, . . 

b. Central Depositories. . . 

c. Mechanized Information Processing. . 4 

Commercially available equipment is not the remedy in every case; . . . There is a 
need for equipment specifically designed to retrieve documents from very large 
collections, , , 

d. Development of software, . , Software, including methods of analyzing, 
indexing and programming, is at least as necessary as hardware for successful 
information retrieval, , , * 

. . Uniformity and compatibility are desirable. , . Switching will be fully effective 
only if the different subsystems adopt uniform practices towards abstracting and 
indexing”. 

There were a number of people, both in and out of the Government, who took these 
recommendations seriously, both then and now. In the early 1960’s, there were a tremendous number 
of projects aimed at this particular set of objectives, all of them oriented around the use of automatic 
data processing equipment or complementary type equipment. These projects included the 
development of associative memories which one could rapidly scan, in a non-structured way, large 
stores of textual information, and retrieve with one search all of the information that met certain 
criteria. They included the first steps toward automatic on-line indexing with consoles, where one 
could have displayed on a cathode ray tube the text that one was editing and have, perhaps on a 
second cathode ray tube, the rules for indexing and/or abstracting. Using a light pen, as an extension 
of one’s pencil, one could index and/or abstract these articles on-line with the computer, have the 
information automatically inserted into the computer, and thus be available for retrieval. They also 
included the development of techniques for coupling document storage media with the indices or the 
means of retrieving documents. 




We have since seen the design and development of several models of trill ion=bi t memories 
and of film storage devices for handling mierotext, None of these have been operating long enough to 
permit cost-effectiveness or comparative analyses. We are still in what I call the stage of “Tweezers 
Technology” with respect to retrieving documents in reduced image forms; this means that the best 
techniques still involve the use of tweezers to pull a microfiche out from its storage place. We still 
worry about wear and tear on microfilm documents in their retrieval process. 

We still worry about automatic indexing because we haven’t solved the intellectual problems 
of indexing. The time-dependency of indexing is one of our most crucial problems. We don’t 
generally have associated with index terms that very essential qualifier that lets us know when that 
index term was useful. We don’t have a way, therefore, in most automated systems of being able to 
cross-reference to permit automated updating of indexes. 

At the time Dr. Weinberg wrote his report, in 1963, there were 400 Information Analysis 
Centers that he identified, COSATI Panel 6 in the first edition of its Directory in 1968, identified 113 
Federally-sponsored Centers, In that list, 24 of these 1 13 indicated that they were using computers; 
that’s about 21%. In 1970, the updated version of the Directory of Federal ly=sponsored Centers 
shows 110 Centers of which 34 indicated that they were using computers; that is about 28%. So, 
there was about a 5% increase in the number of Centers in two years and about a 7% increase in 
those that were using computers. 

The manner of using the computer by these Centers was fairly uniform; namely, for the 
compilation, manipulation, and retrieval of data. In some instances, the Centers offered copies of 
data compilations on tape; in other cases, the Centers offered computer programs or use of computer 
facilities for outsiders. In the majority of cases, the computers were the tools by which Center stored 
and retrieved data in order to answer specific inquiries or compile particular lists of data. In a very 
few cases, the computers were being used as repositories for the data. But in no cases were the entire 
text, or the entire information bank that the Center worked with, stored in computer form. 

In discussing the application of computer technology as related to Information Analysis 
Centers, one generally distinguishes between information handling and document handling systems. 
The products inputted, processed and retrieved in document handling systems are documents, 
document surrogates and/or document reference. The equivalent products of information handling 
systems are information and/or data elements. In designing composite systems or service networks 
the general guidance given is to separate the two types of systems. The reasons are that the 
supporting technologies, equipment and manpower resources are significantly different. 

In a way, the Information Analysis Center as it has evolved represents the worst of both 
worlds. It inputs and processes both documents and information. It is generally supposed to output 
only information. The technologies of both document and information handling must be brought to 
bear in building an Information Analysis Center. That particular merge of technologies is one that 
remains a challenge, as opposed to an accomplishment, at this date. The motivation behind merging 
these two technologies has been stated very well by Dr. Bmnscomb: it is the urgency of information, 
and the urgency in which users place their requirements for information, that has somewhat strained 
not only the technologies of information analysis, but also the intellectual prowess necessary for this 
information analysis. 




One of the reasons for this strain, and perhaps unendurable stress, is that the increased 
amount of information that has been available for the last 60 or 70 years has had some remarkable 
effects. First of all, the elapsed time between the initial discovery of an innovation and its recognition 
as a commercial product has decreased, from about 30 years in the early 1900’s to about 16 years 
following World War II. That means a number of things: indexing schemes have to be changed twice 
as often; thesauri, and the number of terms used, are increasing twice as often. The time to translate a 
technical discovery into a technical product is now down to five years, from about 12 years at the end 
of World War II. The haifdife of information, before it becomes not only out of date but — in the case 
of fields such as health and drugs — dangerous to use, is decreased from an average in scientific fields 
of about seven years to less than five years, and in some cases, in the computer field, for example, to 
about three and one-half years. 

All of this means that it’s difficult not only to read the information, but to use it: a 
state-of-the-art review which now takes two years may well be considered an historical survey. Unless 
we find ways of aiding the intellectual process of making state-of-the-art reviews, and unless we find 
ways of assimilating information faster than we can do it manually, we simply are not going to keep 
up with the rate of introduction of technology. Technology transfer then cannot be dependent on the 
essential hard, complete, and accurate kind of analysis that it should have in order to achieve its 
greatest utility. 

Now there are some interesting counter-balancing effects introduced by technology. It’s a sort 
of “check-and-balance” effect. More information is being required for decision-making before the 
decisions are allowed to become final; therefore, there is a slow-down in the decision-making 
processes when introducing new products and determining the uses for new products. As an example, 
the drug industry, one of the fastest growing in the country, is regulated by the Food and Drug 
Administration. Certain requirements have to be met before a new drug can be introduced to the 
commercial market. The amount of information needed by the government— information largely 
based on the results of experimentation — is such that the length of time before introduction of drugs 
is increasing. As a result, the time which elapses until the drug is available to you, or to your 
physician, is also increasing. The demands for additional information on an ever -increasing number 
of new drugs act as a drag effect on the introduction of the technology. 

One of the major problems faced by any Information Analysis Center in attempting to keep 
up with technology and, at the same time, to make sure that its analyses are just as correct and 
comprehensive as before, is the storage requirement for the documents, and the information itself. It 
appears to computer technologists that the storage requirements are of several types: for storing 
documents themselves, for storing document surrogates, for storing document references, for storing 
the information and data that are generated, for storing the information and data elements that enter 
the system, for storing the management information necessary for Center operation. All of these are 
different. Attempts to use the same kind of storage mechanism for all of these requirements 
introduces a situation that makes effective storage difficult. As was stated earlier, in no presently 
operational system is the full text of all documents stored on computer media for search or retrieval. 
The normal reasons for this are, very obviously, the expense of converting from text to digital form, 
the difficulties of digitizing graphs and photographs, the cost of present computer storage and the 
unproven value of having full texts in digitized form. Unless techniques are developed for retrieving 
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or manipulating texts based on content, there is not much reason for going to the expense of putting 
text in digital form. 

The role of Information Analysis Centers now is such that there can indeed be an impact on 
their operation through computer use, even though only 21 to 28 percent of the Information Centers 
use computers, and these are generally simple uses. The situation is such that a little guidance from 
CG3ATI Panel 6 and a little guidance from computer scientists and information scientists can indeed 
affect these Centers critically in the costs of their operation and in their plans for the future, I think 
that one of the objectives of these Centers, as implicitly defined by the agenda for this meeting, is to 
determine how to keep up with the processing and analysis of information through the most effective 
use of computer technology. I heartily ascribe to that objective and I will be looking forward to 
future progress towards that objective. 
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I wish to express my appreciation to Edward L. Brady and Harvey Manon and their staffs 
for their hospitality and arranging for my participation in this forum on the management of 
Information Analysis Centers. The COSATI Panel No. 6 on Information Analysis and Data Centers 
has supplied a definition of an Information Analysis Center (I. AC) that is out of this world. I had not 
read the definition until I started preparing for this talk. This definition is so broad and big, I believe 
I can say almost anything and it will be appropriate. The purpose of the IAC’s greatly overlaps the 
responsibilities and purposes of most of the Abstracting and Indexing (A&I) Services, particularly in 
the acquiring, selecting, storing, and retrieving of information and also compiling, digesting, and 
repackaging information. This is not all bad. Practically all scientific and technical information 
services must overlap with each other in order to establish a recognized continuum. It seems to me 
that my problem is to outline the policies for management of A&I Services and I AC’s so that there is 
the minimum of duplication of effort, chiefly the intellectual input. Furthermore, we should establish 
a great dependence upon each other so that we have an excellent feedback to help guide the 
development of each of these types of services. 

Whenever I speak before a distinguished group of this caliber, l am afraid that 1 must be 
masquerading around under false pretenses. I am not an expert in information handling, either in 
relationship to A&I Services or to lAC’s. However, I have been closely associated with the 
development of the Chemical Abstracts Services since 1959 and have served as president of the ICSU 
Abstracting Board for the last two and one=half years. Most of my time has been spent on raising 
money and working on methods that would save and conserve the money. These then are my 
credentials as an authority on scientific and technical communications. 

Many times, we who are deeply involved in the information transfer process lose sight of our 
purpose. Our purpose is really simple. We are trying to serve the individuals who have need for the 
information. The attempts to regulate our operational procedures so as to serve the user have not 
been entirely unsuccessful. The user has to be educated to the new methods and doesn’t appreciate all 
of the things that are available. Furthermore, we scientists and engineers who generate the 
information are the principal users. Everything that I want to say has been said, or written, and all 
that I can do in this talk is emphasize some of the high points of the past. It is almost impossible at 
the present time to make an original contribution to this complex problem. 

There is a genuine desire in the United States, and worldwide, to standardize the methods for 
the transfer and dissemination of scientific and technical information. We have made good progress 
in this area of standardization during the last three years. May 1 ask you what do you think was the 
driving force to promote this international cooperation? If you haven’t guessed, it was the lack of 
funds. We have been forced to cooperate in order to survive and carry out our missions. 

Who provides the money for all of this work? Again, the answer is simple. It is the public. It 
makes no difference whether it is a government-supported operation, a scientific or technical society, 
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industry, or so called not-for-profit foundations. The funds in every case come from the public. This 
is a responsibility to the public and is not solely a scientific and technical society, government, or 
industrial responsibility. 

The greatest problem to me at the present time is how to price the services in a fair and 
equitable manner. We have all heard the problems of the primary journals. COSATI was very 
instrumental in arranging for page charges to help keep our primary literature viable. Most of the 
A&I Services of the world are in trouble because of the immense cost of mechanization. In 
practically all cases* the A&I Services are running parallel services using the old classical method of 
hard copy while they are trying to develop more sophisticated computer manipulation of information, 
It is not cheap to develop a computer data bank of information from which there can be instant recall 
or access. The question is how should we distribute the costs? Who should pay for what and how 
much? Also, very few people have tried to study in depth how to market scientific and technical 
information, I have a very wholesome respect for any commercial group that can survive in this field. 

Combined with the cost of the storage and retrieval of scientific and technical information is 
the whole problem of copyright. Sometimes I envy the Russians and their VINITI operation. We 
have copyright of the primary publications with rights to the authors and editors. We have copyright 
for the A &I Services including their magnetic tapes, microfiche, software for the computers, and 
finally, the compiling, digesting, and repackaging which may all involve innovative contributions. 
Sometimes I think it is a shame we just cannot ignore the whole copyright business. If I were the 
author of a few successful textbooks, I would have an entirely different viewpoint. Here again* this 
was well discussed under the “Freedom of Information” act by COSATI Panel No, 6, 

There are so many important areas in the handling of scientific and technical information that 
cannot be discussed in this paper, one of which is our research libraries. There is no doubt in my 
mind that, on a worldwide basis, they will be the real information dissemination centers. In fact, I am 
not too sure that some of these centers you have called analysis centers should really be called 
distribution centers. But you admit this, 

At this point, I would like to say a few words about my own personal philosophy on the 
policy for handling scientific and technical information. This is highly colored by my own 
background which carries a chemical bias. The most effective way to handle information is to 
subdivide it into very small groups. Let each group handle their own information in the way that 
helps them the most. I AC’s are a fine example. This may sound like heresy for me to talk this way 
when I have been so closely associated with one of the world’s largest A&I Services. I was very much 
impressed by the IEG’s (Information Exchange Groups), the so-called “invisible colleges” that were 
established by the NIH and then had to be discontinued because of their conflict with other 
established methods. They were so successful, I would like to see some modified form of them started 
again. The second thing which I believe is that well-operated A&I Services can, and should, supply 
the necessary bibliographic material for practically all mission-operated Information distribution 
centers. This also includes data. Third, the A&I Services will be forced to develop methods of 
classification where the index terms for any one service will be defined and understood by the other 
services. Fourth, I am not too impressed by the classification systems that have been designed to date 
and thoroughly believe that UDC will not be acceptable on an international basis. Fifth, I am not too 



impressed by efforts to develop multilingual thesauri. Probably my lack of knowledge influences my 
strong convictions. Finally, 1 am overwhelmed at the progress that is being made in transliteration of 
languages that do not use the Roman alphabet and the universal agreement which is being made at a 
most rapid pace through UNlSIST, ICSU AB, ISO, IFLA, FID, and a few dozen other organizations. 
This talk was designed to stimulate discussion, so now it is your turn. 
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Information analysis centers must perform the function oj evaluation as well as compilation , 
in order to generate products and services with increased utility and user acceptance . Also , these 
centers must perform a dual role as wholesaler and retailer. These roles , as well as problems and 
examples of production and marketing experiences , are examined so as to elucidate the present and 
future potential of information analysis centers in improving communication among scientists. The 
full potential will be shown to require marketing by information analysis centers operated by a 
complete spectrum of institutions , including governmental agencies , scientific and technical societies , 
not-for-profit groups , and commercial firms. 



Introduction 



In 1963, the Weinberg Panel on Science Information [1], with great foresight, envisaged 
information centers that were technical institutes rather than technical libraries. Such centers would, 
with the aid of dedicated and knowledgeable interpreters, “collect relevant data, review a field, and 
distill information in a manner that goes to the heart of a technical situation,” and thereby would be 
“more helpful to the overburdened specialist than is a mere pile of relevant documents,” The panel 
projected that such information analysis centers would eventually become “the prime retailers of 
information to scientists,” [2|. 

This ultimate potential is apparent from the present developments of several different types 
of information analysis centers based both on the nature of the particular information base being 
covered and on the requirements of the user group to which the output of a center is primarily 
directed, Garvin [3 j has summarized the scopes of such centers. As he indicates, the important 
factor in the Information Analysis Center Concept is evaluation and those products that result from 
it. 



Many centers, whose function is to process information already in the public domain, are now 
well established. Within the National Standard Reference Data System there are some 26 information 
analysis centers concerned with the review and evaluation of data in the physical sciences. In 
addition, there are almost a hundred other federally-supported analysis centers. 



Simultaneously, there are developing within industry comparable operations devoted 
specifically to internal company users and with coverage of both public and proprietary data. In 
addition, there are commercial services, both traditional and new, available at both a “wholesaling” 
level and also at a direct user “retailing” level. 



This report on the marketing of their products and services assumes as one of its basic 
premises the evaluation model of an information analysis center. Further, while the primary focus is 
on evaluated numerical data produced by the centers as distinct from documents, the latter type of 
product is important and will be referred to* In addition, since the user is the dominant factor, the 
production and marketing functions must be closely intertwined and directed to the ultimate user. 
Therefore, both production and marketing are considered in this report* 

Starting from these premises, what are the production and marketing limitations and 
opportunities? How can we successfully market products and services that have predominant 
characteristics determined in the production phases of those products and services? How do we 
grapple with the vast producer-oriented stores of data being generated by scientists and technologists? 
How can we best user-orient the data at information analysis centers? For that, in effect, is the next 
important phase between production and marketing that must be accomplished if we are to market 
the data. How do we work towards information analysis centers of the future as “pri m ® retailers to 
scientists? 

Problems of Production and Marketing 



a. Problems of Production 

The production problems of data compilations and evaluation (see Table I) are understood 
and have, unfortunately in some instances, become an accepted unsolved tradition among research 
workers. These problems must now be tackled and research workers must be involved in their 
solution if research and development work is to be kept efficient and effective. 1 oday ? s new solutions 
involve non-tradition al along with traditional methods. 

It has long been recognized that it is much easier to do a piece of research and report on it 
than it is to review the literature and data in a critical manner and produce an authoritative review or 
data compilation. Many research professors have graduate students who are given individual research 
assignments and from whom research results can be monitored and evaluated* In the case of an 
authoritative review or data compilation, it is usually necessary for the professor or senior researcher 
to remove himself from the research environment, with little or no support from assistants, and to 
examine the information in a scholarly manner. The professional rewards have traditionally been 
larger for the research professor discovering new concepts than they have been for the same 
individual reviewing, evaluating, and compiling the data of others* 



Because reviews and compilations require special encouragement and support, the National 
Bureau of Standards established in 1963 a National Standard Reference Data Program. By means of 
this program, it is possible to have manpower for data compilations fully supported with Federal 
funds in a manner that has become traditional in the support of original research work. To date, the 
program is still in its infancy. While the Federal Government is supplying funds of the order of $300 
million for research in physics, the Standard Reference Data System (NSRDS) is only funding 
physics data compilations at an annual rate of less than $2*0 millions. It would appear that more 
support in the latter area would yield high leverage in increasing the productivity of the former 
investment. 
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Problems In Production and Marketing 



Problem Area 


Problem 


Solution i 


Production 


Inadequate Professional En- 
couragement and Reward 


1. NSRDS 1 

2 , Review and Compilation 
Fellowships 


Publication 


1. High Cost 

2 . No R & D Funds 

3. No Involvement of Science 
Community 


1. NSRDS 

2. Proposed Journal of Chem.. 
& Phys. Ref. Data 


Technology 


Uneconomic Computer Storage 
and Accessibility ; 


Combination of Computer Tape 
and Microfilm Service 


Marketing 


Diversity of Products, 
Applications, Users 


Successful Response to 
Marketing Challenge by Gov- 
ernment, Society, Not-for- 
profit and Commercial Sectors 



While the Federal Government has recognized its .''ssponsibility to encourage reviews and 
compilations, various scientific societies are also recognizing their responsibilities in this direction. 
For this reason, the American Institute of Physics, in representing its seven member societies, plans 
to make a major “review and compiIations ,, proposal to the National Science Foundation to obtain 
financial support. If this support is forthcoming, it is propo' jd that the U.S. physics community will 
be directly involved in providing fellowship funding to outstanding specialists so that these specialists 
can spend a sabbatical year at centers of their choosing to undertake specific review projects. Such 
centers will undoubtedly include many of the information analysis centers that are represented at this 
conference. 



b. Problems of Publication 

The second problem in numerical data compilations is their publication. By their very nature, 
reviews and compilations frequently result in articles that are longer in length and frequently more 
complicated and detailed than are the primary research articles. Extensive tables of data are 
frequently very difficult to have published because of the attitudes of publishers and because of the 
lacK of funding of authors. The authors and their institutions frequently have funds to publish the 
results of their research work, but do not have funds to publish the results of reviewing and compiling 
the data of others. 





A further aspect of the problem of publishing data compilations has to do with the lack of 
involvement of the scientific community in the publication process. In the case of primary research 
articles, the scientific community has established a system of referees who review, for acceptance, 
articles submitted for journal publication. In the case of data reviews, there is not as yet an accepted 
system for refereeing compilations and data values. This is because the compilations have not, in 
general, been published in the scientific literature operated by scientific societies. The result has been 
that the scientific community has not been involved in a formal way. Somehow peer acceptance and 
prestige needs to be developed by the scientific community to those who will analyze, review, and 
compile, and also those who referee the results prior to publication. 

Most of the results of the NSRDS program to date have oeen published by the Government 
Printing Office, and their availability has been announced in media not generally available to 
members of the scientific community on an individual basis, as are the primary research journals. An 
attempt to develop a solution for these publishing problems has been the recent proposal for joint 
sponsorship by the American Chemical Society, the American Institute of Physics, and the National 
Bureau of Standards, of a new journal, The Journal of Chemical and Physical Reference Data, This 
journal will be able to publish the reviews and compilations originating in the centers supported by 
NSRDS as readily as any research article. Under the proposal, the principal elements of the scientific 
community involved in the work of NSRDS will be intimately involved in reviewing, refereeing, and 
preparing data compilations as they are in the same functions for primary research articles. The 
societies also would take care of publishing and marketing. 

c. Problems of Technology 

Another problem is the technological one of how to disseminate numerical data. The same 
determining factors involved in document handling and dissemination are involved in data handling 
and dissemination. In the case of documents, the full text of documents are not going to be 
disseminated in the form of a computer tape for a good many years to come, The case is similar for 
data, although there are now some examples of data being available in tape form for analysis and 
evaluation by the users. An example of such data are the neutron data tapes being produced by the 
Brook haven National Laboratory for use by reactor design groups. An interim compromise to 
disseminating the full text of documents on computer tape is the announced plan of the American 
Institute of Physics to produce a combination package of techniques. One part of the package will be 
a computer-searchable magnetic tape describing the complete bibliographical information about all 
the articles contained in full text on the second part — a microfilm tape issued every two weeks or 
every month* simultaneously with its computer tape counterpart index. As soon as The Journal of 
Chemical and Physical Reference Data has been placed in production, it will be available in this dual 
format, ' 

d. Problems of Marketing 

Government agencies, scientific and technical societies and not-for-profit groups, and firms in 
the commercial sector are all becoming involved in marketing information services and, especially, in 
marketing the specialized products of information centers. An understanding of the relationships of 




these different organizations and their respective types of marketing requires an awareness of what is 
encompassed in the new terms of wholesaling and retailing now beginning to be applied to 
information services. 

Wholesaling includes the production, evaluation, and marketing by, and for, the producing 
scientists as well as the serving of customers who in turn repackage or produce specialized 
information products for retailing to the ultimate user. Thus, if one is to have an economically stable 
industry, the criteria for determining economy, timeliness and quality at the retail level must reflect 
the equivalent criteria incurred in packaging and dissemination at the wholesale level The extent to 
which wholesalers see their principal activity as information generation (through critical reviews and 
data evaluation) may determine the extent that they consider themselves participating in the primary 
research and development function. In the words of the Weinberg panel report, “Transfer of infor- 
mation is an inseparable part of research and development.” However, transfer and dissemination 
without a contribution of evaluation does not appear to command a large value-added factor in the 
market place. 

Marketing has to be done not only by wholesaling to the specialist groups who require 
specialized services by agencies and societies, but by retailing to non-specialist public audiences and 
to specialist audiences of other specialties. To date, customized public retailing has been done 
primarily by the commercial sector. This sector deserves to be encouraged and stimulated in 
continuing in these areas for which it has particular capabilities and expertise. 



The major problem in marketing at both the wholesale and retail level results from the 
requirement to disseminate or deal with a wide diversity of data products, of access and application, 
and of secondary information generators and ultimate users. The dissemination, in turn, has to be 
done under conditions of economy, tin iness, and quality that are acceptable to the user. 

The marketing challenge is therefore to identify and reach the group of potential users even 
when this group is of a narrow scientific or technical discipline. Specialized libraries and information 
centers are possible marketing contact points^ However, as previously 4 dicated, not all potential 
users are linked to identifiable specialized libraries. Particularly in relating to academic users, it may 
be necessary to use broad marketing channels, such as professional journal advertising and broad 
library mailings [4] to ensure coverage of that user segment. The efficiency of this notification 
process becomes a significant factor in the cost of marketing and must be seriously considered in 
establishing a pricing policy which reflects full cost recovery, at least of the secondary dissemination 
costs. 



With this explanation of marketing and its major problem, let us consider other problems 
created by the acceptability criteria of economy, timeliness, and quality. 

1. Criteria of Economy (or Costs and Prices) 

There appears to be experience accumulating regarding what customers are willing, or 
perhaps better, are now conditioned to pay for information products and services. 
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Table 11 lists types of information services involved in the handling of data compilations by 
the commercial sector. There are different types of compilations available with varying degrees of 
evaluation involved in the production process. As shown in the final column, the attitude of the 
market at the present time is that the charges for providing such data services cannot be much above 
the distribution cost level. The experiences undoubtedly reflect the user’s evaluation of the ease, 
ability and costs of reproducing the data himself compared to having the data supplied. If problems 
of acquiring the data by purchase are of a comparable order to reproducing the data, then the data 
will not be bought. If the research is Federally sponsored, then the threshold for the buy decision may 
be still lower. One may surmise that as the availability of sponsorship for research and development 
tightens, program managers will increasingly evaluate the full costs of data duplication and be 
prepared to buy when the data is available. 



■ TABLE II 



Data Compilation Produets and Markets 



Product Type 


Principal 

Identifiable 

Market 

Segments 


Contact 

Channels 


Competitive 

Products 


Market Attitude 
on Value 


1. Unevaluated 
Data 

Compilations. 


* Research 
Peer Group 


* Profess. See, Mnmb. 

t iiniy. Dept 


* Journals 
(Special 
Issues) 


Users value the 
Information only 
at the distribution 
cost level. 


2. Evalu ated 
Data 

Compilations, 


* Rt^arch 
Peer Group 

* Industry 
Design Eng. 

* Education 


* Profess. Soc. Memb. 

* Business 
(SIC Groups) 

* Univ. Depts, 
Libraries 

* Spec. Libraries 


* Publishers 
Monographs 

* Material 
Suppliers 
Catalogues 

* Handbooks 


Users appear to 
value the infor- 
mation at the dis- 
tribution cost 
level plus a small 
return to the ex- 
pert to partially 
cover his cost of 
commentary. 


3. Combination Data 
with Expert Fore- 
cast (principally 
economic). 


• industry 
Business 
Planning 
Function 


• Business 
(SIC Groups) 


• Specialized 
Newsletters 


As 2 above. 


4, Eng neering Design 
Data (usually 
proprietary). 


* Industry 
Engineering 
Function 


• In-House 
Distribution 


• Usually Non- 
Marketable 
Due to Pro- 
prietary 
Content 


Most buyers are 
suspicious of 
anyone offering 
this type of 
product. 
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Since run-off and distribution costs appear to fix the level at which users consider it economic 
to buy compilations, how can pre-run costs be met? The whole question of cost recovery in the 
dissemination of scientific data has been the subject of a study of an ad hoc Panel on Marketing of the 
Numerical Data Advisory Board of the National Research Council— a study related specifically to 
the products of the National Standard Reference Data System, In a memorandum by the Panel 
transmitted to the Director of the National Bureau of Standards, the following two recommendations 



“1. The Panel recommends that the scientists engaged in the important and necessary work 
of data evaluation should be supported by the Government, in similar manner to the 
Government’s position in funding primary R & D scientists; this work should not be a 
target for cost recovery, 

“2. The Panel recommends that the accepted page charge concept for R & D results be 
applied to the publication of NSRDS products as well, In practice, some (if not all) of 
the pre-run costs of publication of data compilations should be considered for Federal 
support,” 

These recommendations are consistent with the concept that publication is a necessary part of 
research as well as its compilation and evaluation; and that users can be expected to pay for data 
compilations at a rate that covers run-off, distribution, and very little, if any, of the pre-run cost for 
evaluation, 

2, Criteria of Timeliness and Quality 

Just as with costs, the criteria on what the specialist, as well as the non-specialist, will be 
willing to afford in time delays and in quality are determined, primarily, in subtle ways at the input 
by production standards and criteria* A prize example in the present context is the extensive growth 
during the 1960’s in the use of preprints and governmental reports that competed with the more 
conventional information transfer mechanisms available in the primary research journals. These 
journals have been, and will continue to be, produced by both society and commercial publishers. As 
the journals have become bigger, more costly, and more delayed, other communication mechanisms, 
such as the preprints and reports, were invented to bypass the problems of the journals. 

It has become recognized that preprints and reports are very effective ways of communicating 
quickly to a specialized audience. However, these mechanisms are extremely costly and are, in 
general, non-public communication mechanisms* There have been concerns expressed that these 
mechanisms have been getting out of control, and will result in the disappearance of journals in their 
present form. However, the multiple advantages of journals and the increased attention to costs and 
timeliness are resulting in renewed recognition that the conventional, proven techniques in the form 
of journals must be strengthened in order to accomplish wide, public, and economic dissemination of 
scientific information. 



were made: 




Examples of User-Oriented Data Products and Services and Their Marketing 



a. Handbooks 

The conventional method of bringing comprehensive data compilations to the market place 
has been by the publication and marketing of handbooks. Such handbooks have traditionally been 
published by commercial publishing houses or by specialized subsidiaries, (See Table III.) 

In such publications, masses of data covering a broad scientific or technical discipline are 
compiled and arranged in an accessible form for the user. The compilation is then published in book 
form [5]. By including within the one publication many sets of data which cover a broad spectrum of 
users, the publication has broad market appeal. 

The data in many cases represent standard values having a useful life-time (to the user) of 
several years. Thus a specific edition is not immediately outdated on publication, and by bringing out 
new editions every two or three years, the publisher sustains a continuous impact on the market. 



TABLE 111 



Examples of User-Oriented Data Products and Services 



Item 


Characteristic 


Example 


Traditional 

Publisher 


1. Handbooks 


Compilation of Data 
in Broad Scientific 
Discipline Published 
in Book Form 


Handbook of Chemistry 
& Physics 


Commercial 


2 . Data Subscription 
Services 

. 


Initial Set of Data 
Followed by Updates 

. 


1. F& S Index of Corp. 
& Ind. Monthly 

2. GE Data Books on 
Heat Transfer and 
Fluid Flow 


Commercial 


3. Individual Com- 
pilations 


Determined by Data, 
i Author, Institution, 
Publisher 


MBS Report of Super- 
conductive Materials 


Government, 

Society, 

Commercial 


4, Specialized 
Compilations 


Proprietary or Other- 
wise Restricted 


GE Eng. Mat. & Pro- 
cess Info. Service 


Commercial 


5. Data Bases of 
Literature 


Secondary Services 
on Computer Tape 


SPIN, CAS, and Com- 
mercial Services 


Society and 
Commercial 



b. Data Subscription Services 



Over the last decade, specialized data services have been developed and marketed on a direct 
subscription basis. Included in this category are services for which the user receives an initial set of 
data followed by updating revisions or extensions on a pre-arranged periodic basis. From time to 
time a new up-to-date comprehensive data base is issued which supersedes all earlier editions. Such a 
service is attractive to the user whenever the data values change with time or in time and where the 
market places a premium on up-to-date validated data values. 

One major segment of data services of this type cover economic or technical-economic fields 
where new data values become available at fixed calendar dates. For such services, quarterly, 
semiannual, or annual data values are important to users. Reference 6 is a typical example. 

In most technical fields, data values do not become outdated or superseded quite so fast so 
that periodic updates, where they occur, are much less frequent. One example is the recently 
introduced series of data books on heat transfer and fluid flow each of which is marketed on a 
subscription basis [7], With each service there is an annual up-dating of the data included in the 

subscription price. This annual up-dating includes the addition of new sections as well as the revision 
of existing sections. 

A more recent development of such data services has been the provision of the data to the 
user m a computer accessible form. This may be either by the provision of data on a computer 
magnetic tape or by a computer accessible data service. For the magnetic tapes, the subscription 
covers periodic updates or supplements, and with many services of this type there are specific 
computer programs available or provisions for user education and training. In the case of the 
computer-accessible form of service, the cost may be made up of a fixed subscription plus a variable 
amount based on monthly access usage of the data base. An example of this type of data base is one 
on organic chemical compounds [8 J . Data is supplied to the user either on magnetic tape for 

in-house manipulation or the opportunity is available to use the data base via a remote access 
computer terminal. 

The principal problems involved in marketing data subscription services involve the 
identification both of potential users and also of the most effective channels to make the availability 
of the service known. In addition to the ultimate users of these data (scientists and engineers), there 
are related services (libraries, information centers, and computer centsrs) whose personnel also have 
interests as intermediate handlers of the information. Marketing techniques, therefore, involve 
brochures, mailings, and advertisements, and in the case of computerized services, may also include 
demonstrations and exhibits at scientific and professional meetings, one and two-day invitational 
demonstration and training institutes, and on-side demonstrations and trials. 

c. Individual Compilations 

in many instances, a specific compilation of data is published by itself. The form of product 

depends on the scope of the data, its author or editor, the sponsoring institution, and the 
characteristics of the user group for which it is intended. 
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Examples can be cited where the finished product is of size and of such broad market impact 
that a recognized publishing house will publish the compilation in book form [9j . In other instances, 
the compilation is more appropriately published through the sponsoring agency as a monograph 
[ 10] , Recently, specialized compilations of data have become available on magnetic tape [111. 

In this type of individual compilation, the compiler of the data is very often aware, 
professionally, of the principal generators of the data and in many instances they, in turn, are aware 
of the compiler’s assignment and responsibility [121. In fact, the compiler or editor has a 
professional responsibility to evaluate and select the data prior to incorporating it in a publishable 
data base. The result is that the editor or compiler performs a gate-keeping or quality control 
function on data values which, to a large extent, become accepted in the profession [13] . 

The difficulties associated with the marketing of such data bases arise from delineating all 
potential users other than those who are data generators themselves. As we have indicated, the latter 
group are known professionally to the compiler, and communications arising during the compilation 
and interpretation process often occur directly. Identification of other potential users is less 
straightforward. While one can list general disciplines or sub-disciplines that should be concerned 
with the data, the specific identification of individuals in colleges, industrial laboratories, or 
government agencies, who would or should have a direct interest, is very difficult. Thus a major 
marketing effort is required to attract the attention of these potential users to the availability of the 
compilation. 

Many times in the past, when the sponsor for the compilation of the data has been a Federal 
department or agency, then the publication and marketing activities have occurred through the U S. 
Government Printing Office (GPO) and the Office of the Superintendent of Documents. It is now 
clear that potential users have not always been aware of the availability of such publications, since 
they personally may not be exposed to the GPO document listing, and they may not always have 
local librarians or other information center personnel aware of their specific data interests. 

To overcome such gaps in coverage requires such things as advertising in professional 
journals, direct mailing to university departments or to companies in specific industry classifications 
and, whenever possible, secondary advertising through newsletter, etc. All these methods will be 
recognized as inherently inefficient since they employ broadcast techniques to communicate with a 
narrow interest group. 

An alternative marketing approach is to seek to develop on an individual basis a list of names 
of the potential users for each data base. Hopefully, this list grows as the data base itself becomes 
more complete and comprehensive. Direct advertising to these users then becomes a more efficient 
marketing technique, though it may miss many potential users of the data. 

A rtv nt challenge, particularly for Federal agency sponsorship of such compilation and 
evaluation activities, is for the sponsor to demonstrate the broad social value of such data 
compilations by market place criteria. In particular, if the data compilation and evaluation functions 
are recognized as research and development activities to be Federally sponsored, as such, then the 
utility of their output should be evaluated by the extent to which they satisfy a significant segment of 
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the recognized potential market at a price level which covers at least the marketing and distribution 
costs. 



This latter is, of course, most easily recognized when a commercial publisher is willing to 
serve as the publishing channel. This has recently happened with the multi-volume compilation, 
“Thermo-physical properties of matter - The TPRC Data Series,” in the process of being published 
[ 1 4 j . 

d* Other Specialized Compilations 

There are certain data compilations in existence for which the distinguishing characteristic is 
that they are considered highly proprietary, or otherwise restricted by security, to a particular 
company or organization. Almost by definition, such compilations are not for sale or release to the 
public or to others on an individual basis except by specific authority. Marketing problems are at a 
minimum, However, consideration is occasionally given to making such compilations accessible to a 
wider public. Dominant factors in the consideration are the identification of the market for the data 
and recognition of the marketing channels through which to contact such groups. 

For example, compilations of preferred design data on materials are created in many 
engineering design organizations in industry. Once created, the question is occasionally asked as to 
whether such an information base would not be saleable especially to manufacturers in related 
industries. Usually the answer is that such information is too sensitive for proprietary reasons to 
release. Occasionally the decision is made to offer such a system for sale. In that case, the marketing 
challenge becomes one of identifying corresponding industrial users and establishing contact 
channels. 

One such example of this type of data base is the Engineering Materials and Processes 
Information Service (EMPIS) [15] . This is an extensive information bank covering descriptive data 
and specifications for manufacturing materials. The service was test marketed for three years, but is 
not presently offered outside the company producing it, though it continues to be an internal system 
within that company for material specifications. One of the peculiar marketing problems encountered 
in the test marketing of EMPIS concerned the inability of the potential users of such information 
(design engineers) to convince appropriate top management that the subscription cost of the service 
was a necessary expense, 

e. Data Bases Covering Scientific and Technical Literature 

A recent report [16] has presented the results of a survey of commercially available 
computer magnetic tape services which can provide libraries and information analysis centers with 
data bases of scientific and technical literature. This directory lists the general characteristics of each 
data base, the most frequently used access points, the frequency of the tape issues, and the number of 
items reported on an average tape issue. 

This particular report is the result of cooperation by a special interest group of a scientific 
society— the American Society for Information Science — and the American Institute of Physics. It is 
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to be anticipated that similar compilations of available data bases in other areas will become 
available through journal articles and other media. 

User Access to Data Compilations: The Test of Successful User-Orientation 



In the traditional priced form, a data compilation is immediately accessible to the user once 
he has located the volume either on his own bookshelf or in the library. The existence of xerographic 
copying has further reduced any tedium that there may have been in transferring specific data values 
to his personal information files. As the volume of primary lesearch information has grown, most 
scientists have been forced into a mode of selectivity of exposure to the literature resulting in a 
decrease in awareness of pertinent information. This, the Weinberg Panel foresaw and postulated the 
development of intermediate information centers for subgroups of users. 

In the long-range plan [17] for an information system for physicists, this type of center was 
envisaged as an integrated information control center. Its major function would be to monitor the 
interests of user groups in subdisciplines and interdisciplinary combinations relating to physics and 
astronomy, and to devise and operate procedures for manipulating its files to provide references 
and back-up documents for dissemination to users. When one adds the function of information 
analysis, the generation and publication of topical status reports and annotated bibliographies to 
supplement conference proceedings, the center expands beyond the concept of a conventional 
library or information store to a technical information institute which would attract consultant 
scientists and visiting scholars to engage in the preparation of reviews and compilations. 

If one can forecast the effects of further significant decreases in the costs of information 
transfer through present day Iand=llne or microwave communication channels, augmented by 
communication satellites and cable T.V., one can speculate that there will develop a close, direct 
relationship between the user and his particular information analysis center, regardless of geographic 
distance. 

It is the direct user interface which is most crucial to the effective working of information 
transfer systems and it is one where our efforts to date have made little headway. In most instances, 
the user now, in answer to this query for some factual data, is invariably given a series of detailed 
sign-post instructions to original papers. Copies of the papers are not attached and his library is 
invariably some distance away. Consequently he loses enthusiasm for schemes which tell him how 
well the primary generators are doing, while he must still hope that his problems will not be 
forgotten, 

Information in the public domain will need to be made more accessible to user enquiry. 
There are many ways of key-word indexing, subject identifiers, machine methods of self-indexing all 
directed toward more rapid query access. This the user is coming to expect, though he may require 
considerable education regarding the price level at which such service can be offered. 

Another area of direct concern to the user is the collection, organization, and dissemination 
of data within his own research environment, whether that be a research institute, commercial 
company, or government agency. Convenient methods of standardized data collection are required 
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with corresponding convenient access methods for co-workers with related interests. Many fields of 
research now appear to have reached the point where organized data stores would enable researchers 
to expand the scope of their own research studies with little increase in cost, and, thereby, increase 
their research productivity. 

These are problems that information analysis centers and others who seek to participate in 
this new industry must address themselves to if they are to retain the interest and support of the user. 
We are convinced that these centers will be able to solve these problems and to fulfill the need for 
evaluated data and knowledge compilations. The Weinberg Panel should be credited with being a 
major force in encouraging the appropriate development of centers. It pointed the way toward 
avoiding, in the future, the stifling effects of the avalanche of information on individual research 

workers. 
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CONSIDERATIONS IN ESTABLISHING A COMPUTERIZED FILE 
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Introduction 

I would like to discuss this topic in terms of a series of questions that the prospective 
establisher of a mechanized file system should ask himself and to which he should provide tough, 
considered answers. 

1. “Do you need a computer system?-’ 

2. “If so, what kind?” 

3. “Can you afford a computer system?” (L e. can you pay all the associated costs, not 
just the purchase price?) 

4. “Is your data in condition for computer processing?” 

5. Lest these seem all pointed in the negative direction, “Can you get along without a 

computer?” 



To get into more detail, we should start with consideration of the data and work ba c u toward 
consideration of the need for a computer. 

Preparing Data for Computer Processing 

More than one information activity has discovered that the cost of converting existing flies to 
machine readable or processable form can be the dominant cost in the development of an information 
system. By readable, here, I mean physically sensible to a machine; by processable, I mean 
sufficiently comprehensible to permit the machine to act upon data. These are quite different 
concepts. 

Here is an example. One study of computer applications at the Patent Office estimated the 
minimum cost to convert their entire existing file of 6 million patent documents to machine readable 
form at $180 million. Or, consider the problem of converting the Library of Congress catalog — a file 
that cannot be taken out of service and which contains some handwritten records, not able to be read 
by any automatic reader. These, and other information operations, would face the problem of low 
return m initial investment in a computer system, until a sizable portion of the files is converted. 
Not always a happy prospect. 

Information items which can be read and interpreted by humans may not be able to be read 
by a machine. In fact, if we think of business files we often find that records contain cryptic notes 
which serve to recall the real Information, which is stored in the mind of the clerk— perh aps the very 
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clerk, seemingly expensive and inefficient, whom you are trying to replace with a machine. We must 
face the question, then, “Can you actually do without human analysis as a part of the retrieval 
process?” In recognition of this problem, modern information systems are increasingly relying on 
interactive techniques, so the human observer can remain in the loop, even with a mechanized 
information retrieval system. Also, it is the failure to recognize this question that, in commercial 
systems, causes so many of the irritating problems of computer-generated invoices— dunning letters 
for a zero-balance account, for example, or failure to remove a disputed item from an account after a 
verbal agreement to do so, because “there is no way to change the machine,” Looked at in this way, 
many information files are more complex than they may appear and are less susceptible to totally 
mechanized processing than the energetic computer salesman might realize. 



Do you have all the data that you need for your file? I. e., is your collection or file complete? 
If not, how are you goint to get it? Can you get it? Can you operate with an incomplete data base? 
Whether because the data is not yet assembled or because of conversion delays, is it possible for the 
proposed system to operate with a partial data base? Can you afford the complete hardware system 
before you assemble your complete file? If not, is there compatible software that will enable you to 
start with a smaller computer? Or, can you rent computer time? 

Are there dissemination restrictions on your data that might affect the performance of your 
proposed system? Decide now how you are going to handle matters of privacy or security. Design 
your system around these restrictions. Do not postpone consideration of these restrictions until it is 
too late to do anything about them. 

There are, of course, many information systems in which these problems, or most of them, do 
not arise, These are mostly systems characterized by the use of volatile data - data which is not 
stockpiled for any great length of time. This eliminates problems of conversion and gives the user the 
chance to change his procedures for creating or collecting the data to suit the requirments of the 
informati .1 system. An example is an airline reservation system. There is no great wealth of 
historical data to contend with here— -schedules change, and reservations, once used, need not be 
accounted for. Still we find, even here, occasional references to problems. One is the releasability of 
information. The traveller who is pleased that the computer system keeps track of his business 
associate travelling companion, assuring that the two can be seated together even if they board an 
aircraft at different times, may not be so pleased with this infallible memory if the companionship is 
not on a strictly business basis. 

It Is not unusual for the cost of input preparation, including continuing handling as well as 
initial processing or conversion, to account for half the total cost of operating an information system. 
Yet, the subject rarely gets half the attention. It is not glamorous, but it is extremely, and in data 
processing, supremely important. 

Paying the Price 

Can you afford a computer system? Purchase price or rent is the most obvious cost in 
acquiring a computer and is perhaps the easiest to anticipate. But there are other costs, including: 
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—A staff of systems analysts, programmers, operators; people with high salaries, 
poorly-defined job functions and high turnover rate; 

—Space devoted to the computer and its staff; and 

—Replacement cost. Just as airlines cannot continue to use aircraft throughout their 
mechanical life, because of the pressure of competition, computers relatively rarely remain in 
place throughout their mechanical lives. As new generations of machinery are developed, the 
cost of maintaining the old, provisioning them (spare parts), finding programming sta 
willing to stay, and exclusion from the benefits of new software developed for the newer 
machines, all militate toward replacement, even where performance is apparently satisfcu O- 
ry Traditionally, the cost per unit of computation has gone down rapidly with succeeding 
generations of computers, so that replacement has some attractions, but the cost of hardware, 
to a continuing operation, is not going to be limited to the initial investment. 

—Possible non-availability of data. While a computer dramatically increases the accessabi lty 
of data, as compared to manual search methods, when a computer is “down” there is no 
access to the data. On the other hand, while data in file cabinets is not as accessible the me 
cabinets are rarely down. Computers are becoming more and more reliable, but 
possibility of a total outage, even if only for a few minutes, always exists. What will you do in 
this eventuality? For a typical library or information analysis center, the answer pro a y 
to just wait. But not everyone can do this. 

Another aspect of the availability problem is in the use of a time-sharing facility belonging to 
a contractor. This complicates your security and accessability problem. Cases have been reported of 
theft of information through a remote console, where the owner of the file, the computer from which 
the data was stolen, and the thief* are all at different locations. 

The communications network lying between user and computer introduces further reliability 
problems. Furthermore, storing your files remotely means you do not have direct control over 
physical access to file storage areas, fire protection, etc. 



What Kind of Computer System? 

At this point, we assume we have decided that the data were in good condition and that the 
costs were bearable. What kind of system should be obtained? This is obviously much too broad a 
subject to try to cover in any great detail. Therefore I will try to concentrate on software systems, on 
the assumption, which may not always be valid, that if we know what we want a computer system to 
do, hardware selection is relatively easily. But even in hardware, there is so little in the way ot 
performance standards that selection is often more arbitrary than we would like. 

Software is the more difficult to evaluate because all of the problems of performance 
measurement are magnified, as compared with hardware, and because it falls upon software to take 
up the burden of the user who does not know enough about his data or his usage patterns. 
Mis-selection of software remains a widespread problem. The typical buyer does not know what 
questions to ask. The typical seller is unable to answer them, if asked. We have the further difficu ty 
that, to a large extent, computer hardware performance is now dependent on the performance o 
systems programs' which are supplied by the manufacturer but whose efficiency is independent of 




hardware features that may promise greater speed. If this software is inefficient or defective, the total 
system will have problems, and there is little the unsophisticated user can do about it. 

There are few information system users who make computer selection decisions based on a 
thorough analysis of the details of system software — the characteristics of the data access method 
which, for example, may have more effect on retrieval speed than the physical access speed of a disk 
unit. The particular software component of the computer’s operating system that manages data 
storage and retrieval on a disk has some important characteristics. These concern the sequence of 
records in the file, the method of indexing the records* the method of changing or deleting records, 
and sensitivity to changes in record sequence. The overhead involved in a multiprogramming or 
time-sharing monitor can wipe out the speed advantage of a new computer. How many users question 
this while evaluating raw computer speed? 

Available methods for evaluating software include “bench-marking” and simulation. 
Bench-marking is the testing of an application* usually with an approximation to the eventual 
software, and probably with an approximation of die eventual data files. The success of the method 
depends upon the success at approximating both the software and the files. The problem is that one 
cannot really know how successful the approximation is beforehand. Also, this tends to favor the 
large, rich bidder who can afford to set up a bench-mark, over the smaller company that may have 
better software, but no funds for elaborate demonstrations of it. Simulation programs are available, 
but these tend to simulate at a too detailed level. This can have the effect of making the validity of 
the simulation model dependent on the user’s ability to predict fine detail when he Is unsure of even 
general parameters. Some examples of parameters which are hard to predict are: rate of query, rate 
of change, and the area within a file receiving the most change (if not uniformly distributed). 

This introduces the subject of just how much the user knows of the operating characteristics 
of his system at the time he makes a computer selection decision. Let me list a. few of the critical 
characteristics, repeating some of those just mentioned: 

— Usage rate of files; 

— Modification rates (not just additions, which often can be accurately predicted, but 
also changes); 

— Reliability-induced requirements for multiple copies of files*, audit trails, file access 

protection: 

— Performance speeds (e.g* retrieval time) required; 



—Need for time-sharing or interaction (For retrieval? For file changes? If the answer to 
the latter is yes, do you understand the effect on the performance of the system? Was it available with 
the last time-shared system you saw demonstrated?); and 



—Are there standards to which you must conform within your organization? Your 
profession? (of hardware, software, data structure or content) 
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Need for a Computer 



At last, we come to the crucial question, - ,are you going to need a computer? Not really a 
separate question, but the answers to the previous questions will largely determine the final resolution 
of this one. My questions have been mainly intended to steer away the information system operator 
who does not really need a computer, and by this I mean a user who, however much he may desire 
one, is unwilling or unable to pay all the prices. The real problems come when an organization 
cannot pay the price but does not know it. 

The Benefit of a Computer 



Let us now consider the other side of the coin, the reasons why a computer is needed. Here 
are some questions which may bring out that need. 

Are you restricting needed services because you are unable to do the job by hand? (e.g. 
permit multiple file search, permit searching on other than the prime sort key, or perform iterative 
searching to help the user arrive at the best answer to his question). 

Are you insisting, against all evidence to the contrary, that information users are able to 
formulate a mathematical statement of their needs (query statement) precisely, on their first try, 
without an intimate knowledge of the content of the files 0 

Are you providing the services your users want or the service they have learned to ask for? 
(i.e, are they adapting to what they think your limitations are, or are you adapting to their needs?) 

Are you able to make the changes in file content that are required? Do you know, from actual 
test, the quality of your files? 

As we can see, the “considerations in establishing a computerized file” are many and often 
complex. Ideal answers are rarely available, for two reasons: (1) Software suppliers and, to a lesser 
extent, hardware suppliers, are unable to predict accurately the performance of their products, and 
(2) users of information or managers of information services are unable to predict accurately the 
behavioral patterns of users, given a new form of information service. In other words, when the 
service changes, we frankly cannot predict what the changes will do to the user population. 

The last of these points is the most important. Basically, we in the information business are 
supplying a service to human users. By changing the quality, quantity or price of that service, we are 
going to change the performance of the users. The value of our service should be measured by the 
value of that change. It is the value of the change in user performance, then, that must make the final 
determination of whether or not a computer is justified. 
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A bstract 

Solutions are presented for several of the problems encountered in handling scientific text in 
machine-readable form in small data centers. The problems discussed are the selection of an ade- 
quate character set for representation of scientific text, the essential and useful features of editing 
routines, and batch-mode information retrieval. 

Keywords; Chemistry; codes for information interchange; editing; information analysis centers; 
information processing; information retrieval; text handling. 

Introduction 

This paper describes how scientific text and data are manipulated in three information 
analysis centers at the National Bureau of Standards (NBS) that store their records on magnetic tape. 
The principal topics are the selection of an adequate character set, desirable characteristics of editing 
routines and the properties of a batch-mode retrieval program. We believe that our solutions can be 
used by almost any center that handles similar technical material. 

The three centers are the Chemical Kinetics Information Center, the Chemical Thermodyn- 
amics Data Group and the Data Center for Atomic and Molecular Ionization Processes, The 
descriptions are drawn from current practice in these centers and from the General Purpose 
Document Image Code System which they share. 

The remarks touch on only a few of the problems that confront the data center manager who 
must plan and live with automated records handling. But the topics covered are ones that deserve 
careful attention. 

If a theme runs through all of these remarks, it is that of a general approach to text handling. 
Programs must be adaptable to many types of records. Any type of device that produces machine 
readable records must be an acceptable input device. Any type of printer must be accessible. * 



* Based upon a paper presented at thv Forum of Federally Supported Information Analysis Centers, May 17-18, 
1971, at NBS, 
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The adoption of a “genera! purpose” approach in our data centers has several bases. First, the 
three data centers have very different needs. Their managers know how to do the jobs with manual 
methods, and are unwilling to degrade those methods in favor of the machine. The possible uses of 
machine records files were unpredictable. 

Secondly, computer hardware and software change rapidly. It appeared advisable to avoid 
being tied to any particular devices. 

Thirdly, although the three centers have different needs, many of their records-handling 
problems are identical. It appeared more economical to build a baste system usable by all and then to 
supplement this with special programs for particular applications. 

Finally, from the start there has been a strong desire to demonstrate that techniques for 
handling scientific text could be developed within a framewor* consistent with national and 
international standards for information interchange, That this can be done has been proved. 

The remarks are addressed to two audiences. Those about the selection of a suitable character 
set are addressed to all operators of centers that deal with scientific text and data. The message is: 
accept no compromises, they no longer are necessary. 

The remarks about editing and information retrieval are for a more select audience: the 
managers of small self-contained information analysis centers, A center with a staff of ten is large in 
this frame of reference. 

The small center of concern is one that must arrange for all its computerized services, either 
by buying or renting existing packages or by having programs written specifically for it. Probably the 
main purpose of this center will be evaluation of data. In its first few years of operation, it will need 
computer techniques, but iot elaborate ones. 

The center that is, or can be, imbedded in a matrix of a computation center that provides 
many clients with a variety of text handling techniques is in a more favorable position. It can let 
somebody else worry about these details while it gets on with its main business. But even so, these 
remarks may be pertinent. They may help evaluate the available services. 



Input to a computerized file is a very large topic. It deserves, and gets careful attention from 
data center managers. Planning at this point is important. The more carefully planned the input, the 
more effective the later use of the material. Also, input is the largest single task of a data center that 
collects material from an active field. As the sorcerer's apprentice learned, once you start the flow, 
you can’t stop it. 

Only a few facets of this subject are explored here. First* input should be easy. This means 
that the device used should be as much like a typewriter as possible. A typist, not a specialized 
operator, should be able to run it. She should be able to use all the techniques taught in typing 
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classes. Today this means a typewriter that produces punched paper type or magnetic tape or a 
cathode ray display. The typewriters are still more flexible than the CRD’s but the latter are 
improving rapidly. The keypunch is out. 

Second, the record produced in the computerized file should be reasonably independent of the 
machine that produced it. This can be achieved, but it is hard work. But the work is worth it. This is 
because input devices have improved greatly in the past five years and will continue to change. 
Machine independence of records makes it practical to change input devices, and to build a modular 
system. 

Machine independence and modular construction are illustrated in figures 1-4, Figure 1 
shows a single purpose system, in this case devoted to preparing printed output. Modular 
construction is shown in figures 2-4, Input from various devices is converted into a common 
numerical code, in our case the General Purpose Scientific Document Image Code (GPSDIC). All 
manipulation programs process these GPSDIC records. Output to specialized printers starts with 
GPSDIC records not original input. It is simpler to program separately for input, output, editing and 
searching. One input program serves all of our input devices. It is tailored to each device by the 
insertion of translation tables. 

Third, and most important, is the selection of a suitable character set. The set should be 
sufficient to permit input of the material handled by the data center, without serious approximation 
of the text. Our criterion is that it must be possible to input and store a scientific manuscript in the 
symbolism normally used. This, as it turns out, is sufficient for almost all other purposes. 

This subject is only now receiving the attention it deserves from the hardware experts. The 
upper case alphabet is insufficient. The 88 characters on a typewriter are not sufficient for physics, 
chemistry, mathematics or library practice. But 188 characters can be sufficient, and, with the 
features described below, can be wildly extravagant. 

The character set to be described is that for a General Purpose Scientific Document Writer. 
This was designed at NBS by B. C. Duncan 1 , 2 . It has been realized in a line printer and in a 
prototype punched paper tape typewriter. Several commercial machines come close to having the 
necessary features to produce all of the symbols. Usually the missing symboli cal! be constructed by 
overstrikes, as is commonly done for cent: The character set is the basis for the storage code used 

by our data centers. Figure 5 shows the type of text with which we must contend. 

The features incorporated in the General Purpose Scientific Document Image Code are listed 
and discussed below. 

(1) 1 88 primary symbols. 

These are shown in figures 6 and 7. Figure 6 is the American National Standard Code for In- 
formation Interchange (1968), together with its control set 3 . These 94 characters are supplemented 
by another set of 94 (figure 7, left hand side). This supplement includes Greek letters, mathematical 




symbols, diacritical marks, line segments for chemical display formulae and a few special symbols. 
Thus this code is an extension of ASCII 1968. 



(2) Each symbol may be used: 

- — as a main line symbol 
—as a superscript or subscript 

— in each of seven modifications (type faces) 

(3) Binary combinations of symbols (overstrikes) are allowed. These may be used for 
—letters with diacritical marks 

— composite mathematical symbols 
— as desired by the user 
See figure 7, right hand side* for examples. 

Taken together, these features permit one to record almost all scientific text “in the clear”. 

In terms of an input machine, or a line printer, the features (2) and (3) imply half line spacing 
and overstriking without erasure. Punched tape typewriters have these now, but they are still not 
generally a\ liable on cathode ray display devices. Surely they will be. 

The reader who studies this set of characters will find things that he does not like. Probably, 
there will be widgets and wiggly lines favored in his field that are missing. Any character set will have 
approximations. This set attempts to provide reasonable coverage of common needs and to include 
acceptable alternative representations for very specialized symbols. We suggest that it is sufficient for 
data centers. 

Use of an extended character set has an important impact on the internal operation of a data 
center. No longer must artificial abbreviations be introduced, Greek need not be written out, 
formulae are not confounded by putting subscripts on line. The instruction to the typist becomes 
**type it as it is”. The simplification in operating procedures is very great. 

Our experience shows that ASCII 1968 can be the basis for a practical text handling system. 
We hope that equipment designers will use the national standard code and its control set in specifying 
what an input or output device must be able to do. If they do, text handling systems will become 
more readily transferable from one machine to another and between data centers. 




Editing 

Records that are typed from rough copy require proof-reading and correction. These steps 
are part of the input process which, by successive approximations* produces acceptable copy. 

Records that are part of an archival file may be selected and rearranged to make up the text 
of a report. Editing is also necessary at this stage to correct overlooked errors, polish the text or to 
insert directions needed in specialized printing programs. 

One editing program in the General Purpose Scientific Document Image Code lext handling 
system is used for both of these functions. It operates in batch-mode, but its features should be 
applicable to on-line editing. These features are cataloged below in two lists: the essential and the 
useful. The context in which the lists should be studied is the editing of an existing file. This is 
slightly different than the editing of material while it is being keyboarded for the first time. 

Essential Editing Features 



Delete lines 
Insert lines 

Substitute lines for existing ones 
Change fragments of text 

The fourth item, correction of fragments of text, may need explanation. It is used to alter a 
word, part of a word, or a phrase without disturbing the rest of the line. The book “1066 and All 
That 5 * has an erratum that calls for the same procedures on grand scale: “For pheasant read peasant 
throughout”. This technique is easily the most popular one in our data centers. It is almost always 
used when the text is complicated. The logic required for a general purpose fragment-correction 
routine can become involved, especially if it is to be efficient. It is well displayed in the 
SUBSTITUTE program in the EDPAC set 4 . 

Data center managers should make sure that their editing programs include these essential 
techniques. They should be easy to apply. The criterion is that they be easy for a typist to apply day 
after day, not that the center manager can figure out how to make them work. 

Useful Editing Features 



Change interline spacing 

Reserve space between lines (leading) 



Reorder lines 



Justify and center lines 



Make up paragraphs from uneven linos 



Paginate as desired 



Insert “canned’' headings 
Introduce typesetting commands 

These will be wanted once the type of text that is to be edited escapes from the straightjacket 
of a collection of single lines of information. Data center managers should see to it that such features 
can be added easily to their editing programs. 

Few of the details of the GPSDIC editing program are pertinent to this discussion, It is 
sufficient to note that the directions used to run the program are a series of commands each followed, 
if necessary, by lines of new text. The form of the commands is simple: 

Delete page 3 line 4 through page 3 line 17 
The required items are underlined. The form was borrowed from OMNITAB 5 . 

Batch-mode editing has one major disadvantage. It is not possible to check the success of the 
edit until the entire run has been completed. A second pass often is necessary. This is a very strong 
argument for the use of on-line techniques. Our system has another limitation. It requires that the 
editing be done in sequence from the first line to the last. The result of this limitation is that very few 
editing records are prepared on punched paper tape typewriters (in contrast to the preparation of 
text). Instead, the corrections are prepared on punched cards. It is far simpler to check out a deck of 
cards (and add a few) than it is to edit an editing tape. 

Retrieval of Information 



A common reason for creating a large scale information base in machine readable form is 
that, later on, one may readily retrieve selected portions for various uses. If the file of information is 
carefully structured, and if what is to be retrieved is known in advance, the retrieval scheme may be 
tailored to the records. 

It may or may not be reasonable to hope that machine retrieval of information will be 
accurate and adequate, but it surely is folly to suppose that an on-going data center will know in 
advance what it will have to retrieve and how best to do this. 

It was this uncertainty and a recognition that the format of input used by our data centers 
would change from year to year (and from problem to problem) that controlled the design of our first, 
and possibly last, search program. 




The first design criterion was that any reasonable s of records should be legal input, that 
ehere should be no prescribed structure of the text. The second was that it should be possible to state 
the retrieval criteria for a particular search in logical Broken English, It was suspected that the 
formulation of correct Boolean statements would be beyond most users program. It is. Our technique, 
using formal grammatical phrases, is only marginally better. Both require very careful planning when 
the searching directions are complicated. 



The GFSDIC search system designed with these criteria in mind is described briefly below. It 
is a granddaughter of the BLQCKSEARCH program by Mrs, C. Messina 4 , 

(1) The only required item of structure in the file is some repeating mark that divides 
the text into logical blocks suitable for examination separately. This “mark” need not be a special 
code. It can be any piece of text that regularly appears on the first or last line of a logical block of 
text. The maximum size of these blocks is dependent solely upon the amount of core memory 
available to store a block. At present we operate with a limit of forty 100 character lines per block. 
This appears to be adequate. 



(2) The search of a logical block is made on a character matching basis: words, 
phrases or fragments of words in the text are compared against a search list, 

(3) Either the entire block can be searched, or a part of it. In the latter case, the part 
to be searched is defined in the directions provided at run time. These directions may specify 
subsections defined by markers in the text or regions defined by character counts. 

(4) Several independent searchs can be used in a single run. 

(5) Either the entire block may be printed out at the end of a successful search, or 
only a part of it. 

None of the properties stated above is unusual for a sequential search program. But the 
independence of the program from the structure of the records may be. This has meant that the same 
program is used for widely differing files, and even files constructed with no thought that they might 
be searched. Indeed, when asked how a file should be structured for searching by our program, we 
have very little advice to give. 

The formal structure of the search commands is illustrated below. Words or phrases 
underlined here are directions about how the search is to be made. Words or phrases enclosed 
between slashes are items to be sought. 

(1) A simple search. 

Find /methane/ and /gas phase/ and /oxidation/ end 

A search starts with find and stops with end . All three items must be present for success. The 
search stops as soon as one of them is found to be absent. 
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(2) A search with alternatives. 



Find l methanelorlpropanelandfindlmothy\ radical/or/ethyl radical/em/ 

Here one of the first pair and one of the second pair must he present. Any number of items 
could have been connected by or. The second pair could have been connected by and (with different 
results, or course). A formal Boolean statement for this search would be much longer, 

(3) A sequence search. 

Find /CH 4 / followed by /— »/ end 

This would define CH 4 as a reactant in a chemical equation (to the left of the reaction arrow). 

The general structure of search directions is suggested by these examples. Intimate groups of 
items to be found are connected by and or and not or by or or or not , These groups are separated by 
“major connectives* 5 such as and find, or find, followed by, and or followed by. The search is from the 
start of the list to the end. At each connective or major connective a decision Is made whether 
irrevocable failure has occurred, or the patient is still alive. Several find . , , clauses may be 

included in one run, They are independent of each other. The word end in this case, appears only 
after the last clause. 

In practice most of the records searched do have some internal structure that can be used to 
limit the material searched. The record shown in figure 5 is an example. The capitalized words at the 
left margin serve as dividers. A search for papers by certain authors can be made without scanning 
the entire record by using a scan . , . to . . direction: 

Scan from /AUTH:/ to /TITLE:/ find /Smith/ but not /Wesson/ and scan from 
/INDEX/ to final find /hydrogen atom/ and /ethane/ write from /BRIEF:/ to /AUTH:/ end. 

In this example “final” is a direction that means scan to the end of the block, “First* 1 is used in a 
similar manner. 



The examples given above do not display all the features of the program, but they show all 
that is proper for a general discussion. Copies of the “instructions to users** and program listings will 
be provided to those interested. 

The question that should concern the data center manager is this: is this type of retrieval 
technique appropriate? It is claimed here that this technique is necessary , but that it is not sufficient. 
The program described permits selection of material fium an unordered file on one pass , Thus it 
permits a search of the input accumulated by a center, without the need for careful logical 
rearrangement of the files. This type of program, if nothing else, is a backstop that is needed. It 
certainly is the first technique a center should develop. 

But in practice our data centers have wanted other tools. The Chemical Kinetics Information 
Center is the example. It maintains in hard copy an author index and an index arranged by journal 
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reference. These classical tools are used very frequently to find specific items in the file. In addition it 
has a “keyword” or “descriptor phrase” index created from the (subject) index terms listed on each 
abstract in the file. Search strategy is often based on this listing. If there is very little information on a 
topic, this subject index is used as a set of “Uniterm” cards for manual retrieval. If the search is likely 
to be complicated, the subject listings help organize a suitable set of search directions for the program' 
described above. 

The computerized files of this center are now of a size (about 10,000 records) that sequential 
searches are be ming costly. We intend to do most of our future batch-mode searching rsing an 
“inverted” subject index file, in the belief that this will be faster. If completely automated, this 
approach would mean a two pass system, which could be slower than the present one pass technique. 
However, this second pass (retrieval of the abstract or paper) will be manual most of the time. It 
usually can be done with shorter turn-around time than when using a batch-mode computer 
approach. Furthermore, we find that we almost always need retrieval of the hard copy. It is necessary 
to be sure that the article is pertinent to a request. Nothing turns off a user quite so rapidly as delivery 
of a batch of (to him) trash. 

Concluding Remarks and Acknowledgements 

In 1971 each of the ideas expressed here should be either standard practice or obsolete. The 
information industry has developed techniques that go far beyond those described here. They are 
available in large systems, or in specific installations. Small information analysis centers need a rapid 
and purposeful transfer of this information technology in order to make them more effective. 
Improved techniques must, somehow, be made available to these small businessmen at a price they 
can afford. The goal should be to let them evaluate data, untrammelled by a necessity to develop 
methods for manipulating records. 

The programs and techniques which these remarks describe were developed primarily by 
three chemists, working as chemists, not information specialists. The system was designed by one of 
us (B. C. Duncan) and is based on his General Purpose Scientific Document Image Code. Most of the 
programs were coded by the present authors. Mrs. C. Messina, Office of Standard Reference Data, 
NBS had made significant contributions in two areas: typesetting and development of EDPAC, from 
which routines have been adapted. Messrs. C. Albright, R. Chandler, and R. McClenon* added 
special programs. OMNITAB, developed by J. Hilsenrath, et al, has been our model for many of the 

control statements. 
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Figure Captious 



FIGURE 1. 


Diagram of a program dedicated to a single job: preparing and printing via 
computer. 

This example is dedicated to the printing of a report A manual analog is a 
manuscript, a typist, and the typed copy. 


FIGURE 2, 


Diagram of a general purpose program that produces archival records not dedicated 
to any output devices* This structure is used in the GPSDIC system. 


FIGURE 2. 


Input/output independent record manipulation and file maintenance. 

All the techniques shown are independent of input and output. Ideally, they should 
be independent of the storage code. 


FIGURE 4. 


Diagram of a general purpose program that uses archival records to print on any 
available output device. Both the records and the output are independent of the input 
device used. 

In a modular system the choice of output device may be made long after the 
records are prepared* 


FIGURE 5. 


Sample record in GPSDIC. 

Output from a line printer developed at NBS to handle scientific text. The 
character set used is that in Ref. I , fig. 4, 


FIGURE 6, 


The ASCII 1968 character set and control codes. Text and rules typed on a Model 
37 Teletype, The control set consists of the items in columns 0 and 1 plus “SP” 
(space) and “DEL” (delete). The remainder of the table shows the 94 printing 
graphics. 


FIGURE 7. 


Additional characters in the GPSDIC set. 

The Shift Out (SO) set of 94 is the complete array. The SO set of 32 is appropriate 
for a machine that can print 128 characters. The composites are binary combinations 
of characters. This figure is the current (1971) GPSDIC set. 

The character set has been modified slightly during the past few years, to include, 
new uses and to bring it into correspondence with the International Standards 
Organization R 646. 
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FIGURE 1, Diagram of a program dedicated to a single job: preparing and printing via computer . 

This example is dedicated to the printing of a report * A manual analog is a manuscript , a typist * and 
typed copy. 








FIGURE 2, Diagram of a general purpose program that produces archival records' not dedicated to any output 
devices. This structure is used in the G PS DIC system. 
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FIGURES. Inputjoutput independent record manipulation and file maintenance. 

All the techniques shown are independent of input and output. Ideally, they should be independent of the 
storage code . 
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FIGURE 4 * Diagram of a general purpose program that uses archival records to prim on any available output 
device. Both the records and the output are independent of the input device used. 

In a modular system the choice of output device tnay be made long after the records are prepared . 
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FIGURE 5. Sample record in GPSDIC. 
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FIGURE 6 , The ASCII 1968 character set and control codes. Text and rules typed on a Mode! 37 Teletype . The 
control set consists of the Items in columns 0 and 1 plus il SP" (space) and M DEL ” (delete). The remainder of the 
table shows the 94 printing graphics. 
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FIGURE 7. Additional characters in the GPSD/C set. 

The Shift Out (SO) set of 94 is the complete array. The SO set of 32 is appropriate for a machine that can 
print 128 characters . The composites are binary combinations of characters. This figure is the current (1971) 
GPS DIG set. 

The character set has been modified slightly during the past few years , to include new uses and to bring it 
into correspondence with the International Standards Organization R646. 
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A CASE STUDY OF USER ACCEPTANCE OF AN INTERACTIVE RETRIEVAL SYSTEM, 
SOME THOUGHTS ABOUT CASE STUDIES, AND A THOUGHT ABOUT LEGITIMAZA- 

I TION 

Dr. Don H. Coombs 
Director , ERIC Clearinghouse on 
Media and Technology 
Stanford University 



Rather than just report on our experience with one particular interactive retrieval system, 
Lockheed’s DIALOG, I would like to take a slightly broader view. By way of introduction I would 
like to philosophize briefly on different ways to evaluate retrieval systems. Then I will discuss 
DIALOG, and Til finish up by suggesting a new concept which may have relevance to information 
retrieval— to all kinds of information retrieval, batch processing as well as on-line. That concept, 
which I propose at least half-seriously, is Legitimazation , 

I, Introduction 

There are various ways to evaluate retrieval systems. My prejudice, coming as I do from the 
Institute for Communication Research at Stanford, is toward measures which involve people — to- 
ward behavioral measures, I don’t wish to suggest that these are the only worthwhile measures, but 
only that they can indeed be worthwhile. 

In trying to use behavioral measures to evaluate interactive retrieval systems, we are at a 
primitive level— the case study. (I consider what we did with DIALOG as primarily case studies, 
even though there was some scaling involved,) We are at the case study stage, although there are lots 
of more valuable approaches than the case study. But you do case studies when there is no better way 
to get a grip on a problem. 

The ideal alternative to the case study for evaluating interactive retrieval systems is obvious. 
You assemble all the available systems, a variety of data bases (although this may be ridiculously 
ambitious), and a relatively large number of representative users. Then, using a design to control for 
as many threats to validity as possible, you measure ultimate user satisfaction. 

The importance of working with “representative users” is often overlooked; if you want to 
generalize to the universe of potential users, you would be well advised to involve a probability 
sample of them in your testing. Most retrieval system evaluation is done by information retrieval 
specialists - by systems people or, even worse, by hardware people. At first glance that’s a rather 
good situation, like having an expert mechanic tell you how good a car is, when you’re interested in 
buying it But that’s not a good analogy, A better analogy would be having General Motors tell you 
how good the Vega is — and that’s precisely what those colorful, and expensive, brochures that they 
give away in the showroom are all about. It may be a good way to sell cars, but it’s a poor way to 
evaluate performance. 



To top off that analogy, it would be like having the General Motors dealer evaluate the Vega 
for you when he himself drives nothing but Cadillacs, That is my way of suggesting that the designers 
of a retrieval system probably are a long way from being, themselves, representative of the potential 
users. 

Now, why am I not presenting a decent behavioral evaluation of different interactive systems 
here this afternoon? For at least one reason: It hasn’t been possible to set up such a procedure— to 
make many systems available, at the same time, in comparable (and realistic) circumstances, A year 
or so ago we had ERIC flies available on three interactive systems at the same time, from computer 
terminals in our clearinghouse. There was Lockheed’s DIALOG, System Development Corporation’s 
ORBIT, and Stanford University's own SPIRES. But we really didn’t have anything like comparable 
situations, which would allow fair conclusions to be drawn. 

Just to illustrate why the situations weren’t comparable: The clearinghouse staff was more 
familiar with DIALOG, having had it installed first. And because of contractual arrangements, it was 
easier and cheaper to get DIALOG “up”. Neither ORBIT nor SPIRES was then available with a 
cathode ray tube for quick visual display, and the SPIRES system was still in process of development. 

Since that time there have been notable attempts to present different interactive systems in 
something like competitive situations, such as at recent conventions of the American Society for 
Information Science. One flaw has been that most of the systems were operating only with toy files, 
which leaves open a good many questions about real-life performance. 



Recently U.S. Office of Education personnel have gone through an exercise which approaches 
comparison of different interactive retrieval systems. They did this in awarding a contract for such a 
system, to be available at a number of east coast sites. It’s my understanding that Lockheed’s 
DIALOG, the system I will be describing today, won that contract. If any of you are interested in 
information on that project, I refer you to Harvey Marron or Chuck Hoover at the Office of 
Education, 

I think the reason I have gone through this introduction is so you would be sure I had no 
pretensions that what we did with DIALOG approached high science. We did the best we could, at 
the time. I find it encouraging that today we would, I think, do better, 

II, User A cceptance of an Interactive Retrieval System 

To turn to what we did, I need first to get on record the way DIALOG operates. Rather than 
a detailed description, this will be an extremely rudimentary explanation. 

The commands which allow the searcher to manipulate the file on the computer are relatively 
simple. Each of the special characters on the terminal keyboard above the numerals (such as @ and 
%) stands for one command or one type of manipulation. The principal commands used in searching 
are the EXPAND, SELECT, COMBINE, DISPLAY and KEEP, 

Briefly, the EXPAND command can be used to bring onto the CRT a “window” or a “page” 
of the alphabetical index where a particular term is located, giving the number of citations posted to 
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the terms as well as the number of cross-referenced thesaurus terms listed for each. Each visible entry 
is marked with a reference number plus the letter “E” (El, E8, etc.). The EXPAND command also 
allows the user to look at the thesaurus. (Both uses can be seen on page 1 of Fig. 1, which is an 
annotated terminal record,) 

The SELECT command allows the searcher to set aside for future use any terms which he 
wants to incorporate in his search. The typed terminal record indicates the identification or set 
number which is assigned to each group of documents as it is set aside, (See page 1, Fig. 1.) 

After selecting out every term in the file which relates to each concept in his search, the 
searcher can COMBINE (as on page 2, Fig. 1) the terms appropriate to each concept by adding (in 
Boolean logic, QRing) the terms together to create a new set. When the selected terms have been 
grouped according to concept, the concept sets are then COMBINED again, this time so that the sets 
intersect (in Boolean logic, the terms are ANDcd). 

The resumes for the new, narrowed set of documents can then be brought to the screen one by 
one, using the DISPLAY command (see page 3, Fig, L), The searcher may now “page through” on 
the CRT what he has retrieved, selecting (KEEPing) those documents which he will wish to examine 
further in hard copy or microfiche form. Finally, the results can be printed off-line, or they can be 
typed on the terminal printer. The format for this printout is also at the option of the searcher. 

An attempt was made to get people in a variety of professional roles to sit down at the remote 
access terminal for two-hour sessions. The nine evaluators were: 

1. A researcher engaged in the planning, technical aspects and conduct of educational 
research projects (28, M), 

2. A motivational educator in private practice, working with children referred by schools, 
doctors, etc. (41, F). 

3. A graduate student in education who will return to district level to work (25, M). 

4. An M.D. engaged in psychiatric research and therapy (29, M) 

5. An assistant professor of linguistics and computer science (27, M). 

6. A university librarian directing library automation, and involved in developing a 
different on-line retrieval system (40, M), 

*7. A professor of education teaching and doing research in educational psychology (52, M). 

8. An elementary school teacher (24, F). 

9. A secondary school teacher doing graduate work to assist him in developing media 
programs at his school (26, M), 
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The index around the descriptor, BASIC ADULT 
EDUCATION, was expanded (E) , No entries were 
posted to this term, 

XT/BASIC READING However, BASIC READING was there and 
was selected (S) to form met J » 

BASIC READING was then expanded by its reference 
nuiiher, E9, in order to display its thesaurus 
entries. There was nothing of interest, 

ADULT BASIC EDUCATION Was 

IT/ADULT BASIC EDUCATION expanded and then selected 

to form set 2, 

It was expanded again by the reference number to 
show the thesaurus entries. 



JT /LITERACY EDUCATION The tarn LITERACY EDUCATION was 

found there and selected. 



2 T /REMEDIAL INSTRUCTION 
IT /REMEDIAL MATHEMATICS 

it/remedial programs 
it /remedial READING 

IT/RF, MEDIAL READING CLINICS 
IT/REMEDIAL READING PROGRAMS 

REMEDIAL PROGRAMS was 
reference number. 

IT/ COMPENSATORY EDUCATION 
IT/EDUCATIONALLY DISADVANTAGED 



Next, REMEDIAL PROGRAMS Was 
expanded, and' it and a number 
of alphabetically related 
terms were selected (sets 
4 through 9), 



then expanded by 

Two more terms (sets 10 and 
11) were located in its 
thesaurus on tries , 



COMPENSATORY EDUCATION 

IT/COfP ENSATORY EDUCATION PROGRAMS Was then expanded by its 

reference number (F. 4) 
and one more relevant' terms, 
was located in its 
th esaimic expane ie> i 
(set IP). 





E-IT/ADULT EDUCATIO 
£“E2 13 

S-E5 14 

S-K8 15 



The searcher hcpw turned to 

8 IT/ADULT DEVELOPMENT the adult aspect or concept Of 

231 IT/ADULT EDUCATION Mb search, Expanding ADULT 

117 IT/ADULT EDUCATION PROGRAMS EDUCATION produced three relevant 

terns (sets 13 * 14* 15), 



E- IT /ADULTS 

S-E4 

S-E6 

S-El 

S-E5 



16 

17 

18 
19 



E-IT/BASXC SKILLS 
S-E5 20 



IS^i IT/ADULT VOCATIONAL EDUCATION 
1 XT/ADUT VOCATIONAL EDUCATION 
40 IT/ADULT STUDENTS 
87 IT/ADULTS 



50 IT/BASIC SKILLS 



Expanding ADULTS 
produced four terms to 
be selected (sets IS 
through 19 ) . 



And finally* BASIC SKILLS * which relates 
to the earlier concept* was expanded 
and selected' 



C«3-=12/+ 



C- 1+2 0+21 



0-13=15/+ 



C-23+18+19 



21 



22 



23 



24 



504 3+4+51*6+748+9+10 +1 If 1 2 Set 21 was created by the union or 

addition of sets 3 through IB* 

This set then included most of the 
remedial or basic education terms , 

595 1+20+3+4+5+ 64 7+8+9+10+11+12 Set 21 was then added to sets 

1 and 20 to create the basic 
education concept group contain- 
ing everything in the EMIC files 
indexed by one of these terms* 



352 13+14+15 



456 18+19+13+14+15 



Set 23 was created by adding some of 
the adult terns together. 

This process was completed by the 
addition performed in set 24 1 which 
now represents the adult concept in 
the eearch* 



Bote that so far set 2 has been ignored * because it prs-coordinatas 
the two concepts in the search and therefore should not be included 
in either concept set , 

0-22*24 25 39 (1+20+3+4+5+6+7+8+9+3 0+11+12) *(18+19+13+14+15) 

Sets 22 (the basic or remedial education concept) 
and 24 (the adult education concept) were next 
comhin&d (C) to form an intersection* ike AND 
in Boolean logic* with the resulting set 28 
containing 30 items indexed by at least on& term 
from each concept set. 



D-25 
K-25/1 
K- 25/2 
K-25/3 
K-25/4 



S-IT/ILLITERATE ADI) 26 



C-26*22 27 



C- 2 5-1-2 7 



28 



D-28 

K-28/2 

K- 28/7 

K-28/8 

K-28/10 

K+23/12 

K-28/13 

K-28/14 

K-28/16 

K-28/18 

K-28/19 
K-28/20 
K-28/21 
K-28/23 
K-26/25 
K-28/26 
K- 2,8/27 
K-2B/2S 
K-26/29 



iTais eei was iftsn displayed CD) and the first 
four retrieved items examined one by one on the 
CRT. The relevant ones , in this case all that 
were examined, were set aside using the keep (K) 
eormand into a reference set for future attention. 
This reference set is arbitrarily numbered 99, 

33 IT/ILLITERATE ADULTS In the examination of the first 

four items of set 25, a new term 
Was turned up, ILLITERATE ADULTS, 
which had not been located earlier. 
This term was now selected directly 
(without going through the expansion ) 

21 <1+20 *3+4+5+6+7+8+9+10+11+12 ) * 2 6 

The resulting set 26 was combined by an AND 
operation with set 22 (basic education terms ) 
to form set 27, 

57 « 1+20+3+6+5+6+7+8+9+10+11+12) * < 18+19+13+16+15) )+<(l+2Q 

+3+4+5+6+7+8+9+10+11+12) *2 6) 

The results of this combination and the previous 
one (set 25) were then added together to for + 
set 28. Note that the number of items in 28 is 
~*ot equal to •' He sum of the items in sets 25 and 
27, This is so because a coirbination creates a 
set of unique documents where no item is 
repeated a second time. 



Set 28 was then displayed 

item by item and the relevant ones 

eat aside in the re ferar.ee set. 



K-2S/30-57 



C-2-28 



After examining 30 references and 
keeping BB of them* the evaluator 
determined to keep all of the remaining 
1? and proceed further with the search. 

29 109 2^ (( (l+20fr3+4+5+6+7+8rh<M‘10+ll+12}* (18+19+13+14-M5) )-h( (i-H 

20+3+/i+5+6-l-7+W9+ao+ll+12)*26)> 

Set B9 was created hy subtracting the items 
already examined from set 3 the ADULT BASIC 
EDUCATION set. This avoids duplicate printing 
of any items . 



A second aspect of the search was begun to turn up industrial 
and fob training prog r cons which dealt with basic skills. 
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expanded to its thesauru* 
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entry . This produced 
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IT/OFF THE JOB TRAINING 
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The seven industrial training terms 
were then combined by an OR 
operation to produce set 37. 
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38 
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16 and 17 which also relate to 
the same general concept * 
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This sum was intersected with the basic education 
concept group (set 22) to gat sat 39. 
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ITEMS HAVE BEEN PRINTED 


Finally three pudnis (P) Were 
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initiated of sets SB 9 29^ and 
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39. Format S which contains the 



indexing^ cataloging ^ and 
abstracts for each document was 
chosen , After the print had 
been comp la ted off-line at 
Lockheed the results were sent to 
the clearinghouse for forwarding 
to the evaluator. 



Each user was asked to come about 30 minutes before the time the system became operative, 
and at that time filled out a form giving information about himself. The user's questions about the 
system were answered, and he was shown the terminal. When the DIALOG programing was loaded 
in core and the ERIC document file made accessible on an IBM Data Cell, the visitor sat down at the 
console and performed all searching himself. He was coached throughout the session, and prompted 
to make use of different aspects of the system until he had some familiarity with it. 

! After the session was over, there was a structured “debriefing.” The transcripts of these were 
coded to produce some relatively objective summary results, and the transcripts also provided 
verbatim answers- 

One big question to be answered was whether individuals with no previous experience could 
sit down at a terminal and, in a reasonably short time, use such a system effectively. The answer was 
yes. 

For the most part, evaluators were enthusiastic about their two-hour experiences. Before they 
were prompted to comment on specific aspects of the system, the visitors were encouraged to put on 
record whatever impressions they wished to report- The two aspects of the system which were most 
frequently commented on were 1) its speed, and 2) the way it “widened horizons,’’ the way it 
suggested other relevant areas of information or different approaches to the information originally 
sought. 

Some of the UNPROMPTED statements about the “horizon-widening”: 



“ , , , It opened up new avenues for thought, 1 '' 

“ , , r It expanded areas that I hadn't considered as being related to the subject.” 
“It had a sort of fallout of new ideas and possibilities.” 

6 T was amazed at - . . what possibilities it offered for further learning,” 



After each of the nine evaluators had commented generally on DIALOG, he was asked to 
specify the good aspects of the system. There were a total of 44 good points singled out, (This is a 
little like compiling batting averages in Little League, but the total score would be 44 good points, 
28 bad points listed. Of greater value than the 44/28 breakdown is the finding that there were 25 
different good aspects reported, 18 different bad aspects, 



Six of the evaluators noted the speed of the system and its saving of user time. The next most 
frequent favorable observations were that being able to combine sets was very desirable (volunteered 
by 5 evaluators) and that using the system had opened new avenues for thought, or “widened 
horizons” (volunteered by 4 evaluators). Three of the evaluators commended the system for being 
simple to use, easy to work with. 
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The evaluators then were prompted to identify what they considered bad features, but they 
were not asked leading questions. Delays in waiting for the system to accept and execute a command 
were singled out as a bad feature by four evaluators, as was the feeling that considerable experience 
or time was needed to master all the operating rules. 

Other critical comments: 

“Too many combinations of keys are needed to input one command.” 



“Having to build combined sets one step at a time, rather than using 
parentheses, and doing it with one complex statement is incon- 
venient.'' 

“There is a great deal of -paging’ required on the CRT, because you 
can only look at nine terms at a time.” 



Changes in the DIALOG system, made since our study, have obviated those last two 
criticisms, 

Evaluators were specifically asked about the “pacing” of the system because some of us at the 
clearinghouse came to be critical of delays when it was necessary to wait to input the next command. 
Only four of the nine evaluators were at all critical of delay; most were so impressed by the 
performance of the system that any delay was of no consequence. One felt that such variation in 
pacing suggested that we were running the machine, rather than vice-versa. The general conclusion 
was evident: At least while learning to use the system, few persons are bothered by having to wait 
sometimes to enter commands. 

As already reported, the sessions were successful in locating relevant information for the nine 
visitors. The question or questions which they brought to the session were answered, But two other 
features of the system were evident: Users were led to ask additional questions about the chosen area 
of investigation, and to pursue entirely different matters than those which originally concerned them. 

First I’d like to deal with the “intellectual fallout” or “horizon-widening” effect of the system. 
Seven of the nine evaluators reported that they had asked additional and different questions about the 
subject which they originally were investigating, and seven said that they came upon material on 
different, though related, subjects which they would like to pursue at a later date. 



Verbatims: 



“I began to realize that they had some articles of international scope . 
, , [and that] kind of opened up that area , , s 

“Well, as we began to look at the section on instructional television, 
there were some related topics there that I hadn’t been aware of ... . 
There were a couple of topics that interested me, one in the area of 
teacher training, which I just happened to run across, but I would like 





to look into at another time, although it wasn't particularly related to 
this study*” 



“I formulated new areas to look under, for relevant research,” 



“ . . . I stumbled on something, I stumbled onto [specific] programing 
languages,” 

. . The biggest problem was staying with what I had originally 
pursued, instead of getting off on other interesting things,” 

“I did run across some other information [that I would want to pursue 
later] .... We’ve got our mind on one thing, particularly when we’re 
researching something, one phase, and we only think to look in certain 
areas, and the thing that I liked about this, the machine points out that 
there may be some other related information ... * This is very 
significant, very helpful*” 

I myself assume that such “horizon-widening” effects are extremely desirable* I can’t imagine 
a research administrator with such a narrow area of concern as to object to this aspect of a system* 
To look at the situation from a different viewpoint, this aspect tends to bring the useful documents in 
the collection to the user’s attention even though his preconceived ideas of what is available are 
incorrect. 

Next, to consider the basic interactive aspect of the system: Six of the evaluators had 
favorable comments on the way the system made it possible to monitor and mvlify searches, but the 
other three should not be considered negative or indifferent. Most of the evalv had no previous 
experience with computerized retrieval systems, and so could hardly compare DIALOG with 
non-interactive, batch -processing systems. Our feeling was that the evaluators reacted to DIALOG as 
an entity, and that the overwhelmingly favorable general comments were to a great extent the result 
of this very basic aspect of the system* 

The individual verbatims: 

“It makes a big difference, because you get a feeling of control over 
your search that you don’t have so much when you’re actually in the 
library* There it’s hard to remember exactly which things you were 
going to go back and do— you have to write things down, you have to 
organize things. Here you have handy little systems for putting 
something off somewhere and you just organize in your own mind the 
very basic concepts.” 

“It has a much more organizing effect, it helps to organize in a much 
more effective way.” 
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“It helped me ... * I think the fundamental help was l had some idea 
of the amount of information I was handling, or would be handling if I 
had it printed out. And it gave me some insight, too, into the amount 
of research that had gone on in certain areas.” 

“It’s like having a great mass of information at your disposal, where 
you can somehow set up and know where you are and how much 
you've looked at.” 



“I think it made a great difference.” 

Besides the nine case studies presented in this report, there are some observations on 
DIALOG which are a product of its use in the clearinghouse for a variety of tasks. These 
applications can be categorized as duplicate checking, preparing in-housc projects, and answering the 
information requests of visitors and others users who contacted us by mail or phone. To summarize 
in succinct fashion, the system proved extremely valuable in such uses. 

In all, 46 people had a demonstration and introduction to the system, 68 people had their 
requests searched by a staff member while they were present to interact with the system, modifying 
the search strategy as necessary, and 21 people (including the nine evaluators) were taught to use the 
system and had hands-on experience. These people who used the clearinghouse as a source of ERIC 
materials were able to do their literature searches efficiently. 

Perhaps most important here is the ease of use of the system. For people who are unfamiliar 
with computers and who have only a limited amount of time to devote to their professional research 
and to learning to use a new research technique, no matter how powerful it may be, it is quite 
important that the technique be simple to understand* Experience in demonstrating DIALOG and 
instructing people in its use indicates that it is fairly simple and does not overwhelm the person 
unfamiliar with computers. 

It is interesting to note that the real difficulty in teaching people to use DIALOG had nothing 
to do with the system itself* Rather, it was the concept of coordinate searching that proved to be 
difficult. If the individual understood how coordinate indexing worked, it took only minutes to 
acquaint him with the few mechanical procedures which would allow him to search the file that way. 
However, the linear method of searching out materials is ingrained in most people, and time is 
required to help them understand coordinate searching. 

No idiosyncratic search strategies emerged in the nine case studies, and this was a 
disappointment* I have a long-time, although seldom implemented, interest in cognitive structuring, 
but a retrieval system tends to constrain search strategies. Any system does. It is designed to be used 
in certain ways, and so it is hardly surprising that people use it in those ways. A system designed 
specifically to investigate idiosyncratic search strategies is conceivable, but the essential flexibility 
and complexity would make it quite expensive. 

To sum up our experience with DIALOG, it was favorable indeed. But the whole 
project— and especially our experience with the nine evaluators— undoubtedly was heavily 



Influenced by Hawthorne Effect. How the nine users would have felt about the system after its 
novelty had worn off is something we don’t know. 

Anyone wishing a more complete report on our experience with DIALOG can obtain the full 
90-page document from the ERIC Document Reproduction Service, P .0. Drawer O, Bethesda, 
Maryland 20014, as document number ED 034 431 (on fiche for 65/, in hardcopy for $3.29). 

III. Legitimazation 

In dosing, I’d like to suggest half- seriously that mechanized retrieval systems are serving a 
new function — that of Legitimazation, 

Let me give you an example of Legitimazation— a worst case, at least as far as ethics are 
concerned. Some U.S. Office of Education research contracts specifically require that literature 
searches be completed, for what I imagine are obvious reasons: because the investigators should have 
a good idea of what has been done already before they get their own projects underway. 

Several times, when we had DIALOG available in the clearinghouse, we were approached by 
educational researchers — or their graduate assistants, because that demonstrates how important 
literature searching is considered to be — and we were asked to perform an exhaustive search of the 
ERIC files for relevant material. In each of these cases the search was required as part of their Office 
of Education contract. And in each case, there was a great sense of urgency— because everything else 
about the project had been completed, and the report already had been written. 

Now that’s Legitimazation in the worst sense: Using a retrieval system just so you can say that 
you used it. It’s like not wanting to know how to cure sick people, but wanting a M.D. certificate to 
put on your wall in a nice frame. 

Why does machine search lend itself to legitimazation so well? Because it’s easier— or seems 
to be easier- — to describe what was done. For example, “The complete ASDEC file was searched for 
relevant documents using the Quest III system running interactively on our 360 Model One Million.” 
That has great specificity, compared to “A graduate student spent three weeks in the library, ’ People 
know that’s no good, because they know about graduate students and they know about libraries. 
Being able to cite a mechanized search, in contrast, is like putting a certificate on the wall from a 
good-sounding medical school. 

This makes legitimazation sound all bad, which I don’t think is the case. There s a legitimate 
use of Legitimazation, if you will. And that is akin to someone buying insurance. If you’ve ever been 
in a position to help someone search a file, and found lots of relevant documents, you may have 
observed that when you laid the 300 abstracts on him, the person didn’t smile. The systems people 
smiled, because look at all the relevant things their system produced. But the poor user didn’t smile. 
Either he wanted the three or four most relevant documents, or else he wanted just Legitimazation 
he wanted to find that there weren’t any relevant documents, so that he could go ahead with his work 
and not worry about something like it having already been done. Or worrying about how it meshed 
into any big picture. 



It's all very well to say that research has to be conducted in a framework of previous research, 
so that findings can be hooked up, but it’s another thing entirely to complicate someone’s life with 
more potential hookups than he cares to deal with. (I am not speaking of how science and technology 
should operate, I am speaking of how people do operate,) 

What our man, our last example, wants is assurance that he’s out in the clear and hasn’t 
overlooked anything. He’s willing to pay for that assurance— for that insurance — in money and in 
time for a machine search. In return for paying that premium, he is protected against disaster; if 
someone has done exactly what he’s up to, the blame falls not on him but on the Quest III system 
running on a 360 Model One Million. 

Now why spend time mentioning Legitimazation? Because if that is a real function of a 
retrieval system— if people don't want information, many times, at all, but do want insurance— then 
that should be taken into consideration in evaluating retrieval systems. Most of our measures are 
based on the assumption that the user wants great masses of output, and often, I think, that’s not true. 

Let me put it another way: If we set up a committee to evaluate retrieval systems and the 
committee-members have certain standards in mind, and the superior system is chosen and provided 
to users — 



The system is more likely to be successful if the standards of the committee are similar to the 
standards of the potential users, 

I think Legitimazation is one of the functions desired by users. Maybe we should change our 
standards, or maybe we should change our users. Changing one is probably easier than changing the 
other, but fm not arguing for a particular course of action. Fm just suggesting that some attention be 
paid to the situation. 
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Abstract 

A summary is presented of recent progress at NBS in the automation of book production 
through the development of techniques for computer-assisted phototypesetting. The strength of the 
system rests on general -purpose edit-insertion programs and other general-purpose programs which 
accept a variety of input media. The programs take existing files on punched cards or computer 
tapes; or Magnetic Tape Selectric Typewriter (MIST) cartridges; or files keyboarded on-line to a 
time-shared text editing system; and transform them to match the requirements of the phototypeset- 
ting system at the U.S. Government Printing Office (GPO). 

Examples are shown of finished text consisting of upper and lower case Roman and Greek 
characters, subscripts and superscripts keyboarded on a variety of input devices. The examples are 
from input on punched cards, from a 44 key Selectric terminal and from a “scripting teleprinter 
capable of typing 126 characters in two colors in inferior, superior or main line positions. 

Keywords: Computer-assisted priming, computer input, electronic typesetting, input techniques, 
keyboarding conventions, phototypesetting, text automation, 

1, Introduction 

Little did the planners of this Forum realize when they assigned me the seemingly mundane 
topic of input that they would really be giving me carte blanche and that I would take the opportunity 
to sound off on a number of my pet peeves and talk about some of my favorite people. 

I find myself, increasingly in recent months, in the position of a doctor who is asked to 
prescribe a cure for a patient who sends a relative to the office with a description of his symptoms. 
My advice invariably is, come back with the patient, let me examine him, and then I can prescribe. 

My assignment today doesn’t allow me time to ask any of you to describe his data handling 
headaches. So, what remains for me to do is to describe to you a few of our more or less miraculous 
cures, even though 1 doubt that such a recital is any more ethical in the computing game than it is in 
the medical profession. You are still entitled, I believe, to a brief explanation of the experience and 
the biases which underlie my comments this afternoon. 



‘"Based on a talk presented at the Forum of Federally Supported Information Analysis Centers, May 17-18, 1971, 



When we joined the Office of Standard Reference Data about four years ago, we resisted the 
advice of colleagues to issue a cookbook for data and file organization and for bibliographic formats. 
We saw no profit in that kind of effort because generalities are of little use; to be more specific would 
require us to learn your job even better than you know it yourself; and finally, any dicta on our part 
would restrict the whole operation to the ingenuity of one person, or a small group of people, We 
chose instead to build a tool kit of programs which could take any systematic file arrangement and 
play games with it, that is rearrange it at will to any one of a number of alternate arrangements or 
formats. We've had enough experience with a number of generalized programs to convince us that 
economical solutions of varied data handling, and typesetting problems as well, lie in general rather 
than in special purpose programs. 



We now have a number of interesting instances where an existing program, not at all intended 
for the new job at hand, solved that job more elegantly and more efficiently than we could have 
solved it had we addressed ourselves to the solution of that problem directly. Some of the 
publications that have gone through our typesetting programs at the Bureau have produced dramatic 
savings (about $3000 in one issue alone). If we did not have on hand a general purpose program able 
to cope with the specific requirements of that publication, the cost of writing and debugging an ad 
hoc program would easily have exceeded the savings from computer-assisted typesetting. 

Before I discuss our own programs, you should know what experience we have had with 
programs developed by others, We had some good experience, a few years ago, with IBM’s Text-90 
[1] system, and can therefore recommend its successor, Text-360, for many types of reports, 
especially as the input now connects with a terminal on-line. But even card input to Text-360 is a 
viable and attractive way of using a computer for document preparation. Those of you v/ho have 
360’s, are advised to look into this text editing and formatting package. It has excellent page 
make-up facilities and can be used to feed a phototypesetting process. 



If I continue a bit further with a recital of our experiences, I should mention that we make 
extensive use of on-line keyboarding and text editing, via an IBM 2741 terminal, into a number of 
commercially available text-editing services that support IBM’s Administrative Terminal System 
(ATS). IBM called it DATATEXT, a local outfit in Washington called it VIPCOM, a company in 
New York calls it Word One, These are all minor variations of ATS. If you have an IBM 360, 
(Model 50 or up), and put in an ATS system, most of your text handling headaches will be solved 
overnight at miniscule cost, A number of universities have made it available on their machines, As an 
example, the University of Iowa offers ATS to its staff at $2.00 per hour connect time, plus an 
appi riate charge for storage. Even at commercial rates of $3,00 to $3.50 an hour, which we now 
pay, the system is viable. We use it extensively in keyboarding ordinary reports and for very fancy 
computer-assisted typesetting as well. There are a number of examples in the exhibits here which 
you might look at later, They represent a rather small fraction of the display we have prepared for 
those of you who come back on Wednesday to see some of the work that has gone through our auto- 
mated systems. There will also be demonstrations of a variety of inputting techniques, both on-line 
and off-line, utilizing MTST’s, 2741 terminals and a variety of Teletypes, 

Most of the phototypesetting production, that we’ve been involved in at the Bureau of 
Standards, has gone through the Government Printing Office on the Linofilm machine. That work is 

Numbers in brackets indicate references at the end of the paper. 




fed through one or another of the typesetting programs of the Government Priming Office (GPO) for 
which our programs provide input tapes. We are also making increased use of the Linotron 1010, a 
much faster, and more reliable machine. Most of that work goes through the GPCTs Master 
Typography Program, 

For the last few years we’ve been developing our own software which often permits us to 
bypass the typesetting programs at the Government Printing Office and allows us to drive the 
Linotron directly. This has been a fairly exciting development which has cost us relatively little, 
about one woman-year, and has proved rewarding to the GPO, as well as to us. By us, I mean 
primarily the Data Systems Design Group, which I lead in the Office of Standard Reference Data, 
and the staff of the Computer-Assisted Printing Section in the Office of Technical Information and 
Publications, with whom we work very closely. If any of you have notions about the limitations of the 
Linotron for scientific text, please spend a little time with Carla Messina or Rubin Wagner before 
you leave Gaithersburg and learn how they have been able to tame the Linotron to do their bidding. 
Mrs. Messina’s software, which has been installed at the GPO, now makes it practical and efficient to 
set on the Linotron 1010 complicated technical material containing as many as 1020 different 
characters. 

On the way to the GPO, to paraphrase a popular title, we flirted briefly with a Photon 
typesetter. By flirting, I mean that we put a few small publications through that machine. We are now 
experimenting with a Stiomberg Carlson 4060 computer-driven microfilm device at the Goddard 
Space Flight Center, We also have a low priority interest in seeing whether our programs can be used 
to drive other electronic typesetting devices. That interest stems from our desire to be of service to 
information analysis centers that do not have access to GPO facilities, and must rely on commercial 
phototypesetting services, 



2, Punched Card Input 



Much of our information, and yours, has already been generated on magnetic tape, from 
punched cards, so that the question of keyboarding afresh is not a problem. A problem never-the-less 
remains if one wishes to get away from the upper case character set so characteristic of what Dr. 
Blanton Duncan calls Stone-Age printout. We have a potent medicine for that problem in the form of 
a program called SETLST which is described and listed in NBS Technical Note 500 [2] , 

The programs KWIND and SETLST accept punched cards or magnetic tape records normally 
intended for line printers and produce a magnetic tape properly flagged and transformed to interface 
with one or another of the typography programs at the GPO. The result is graphic arts quality in 
upper and lower case and typeset in mixtures of typefaces (bold, italic, bold italic, small caps, etc.). 
Figures 1 and 2 show the typographic variety in products of this program produced from a magnetic 
tapes that contained only capital letters to start with. 

In other applications of this program, words (See Figure 3) such as ALPHA, BETA etc,, 
have been replaced by the greek letters a, /?, etc. When the material is in tabular form, even the letter 
G standing alone in a fixed field can be recognized by the program to produce its greek equivalent. 
The generality of the SETLST program arises from the fact that it gets its information on how to 
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format a specilic job from a set of control cards supplied at run time. The typographic information is 
supplied only on the control cards, it is not contained in the program. We have other general-purpose 
programs which accept card input and achieve fancy output. Figure 4 shows a portion of a table of 
spectroscopic data phototypeset from punched cards containing just digits and capital letters. 

In spite of these striking examples* we recommend punched cards for input only in special 
circumstances. Punched cards require either special paraphenalia (CAM equipment) or a batch mode 
computer facility. We now favor on-line systems for keyboarding as these are becoming increasingly 
available at reasonable cost 

Our recent efforts have* therefore, been directed to developing keyboarding techniques for 
typewriter-like devices that provide readable copy while capturing the character stream in 
machine-readable form* either on paper tape or (preferably) on the disk of a time-shared computer. 

It should be emphasized that neither the computer nor our programs achieve the illustrated 
transformations on their own. Nor are such transformations always practical even if possible. If the 
data is not itself flagged, transformations are feasible only when the data base is suitably structured or 
otherwise systematic. Isolated exceptions to systematic transformations can also be handled if they 
are known to the person who is operating on the file. What is significant about our approach is that 
the tails of the transformations are supplied in the form of control cards tailored to the job rather 
than programs so tailored. 



The difficulty of keyboarding scientific text on primitive devices was illustrated most 
dramatically by the New York Herald Tribune on January 31, 1929 and again the next day when 
they published the full text - equations and all - of Albert Ebstein's paper on the unified field theory. 
The mathematical equations were translated into words which were cabled along with the text. The 
equations were reconstructed from the words, were written by hand, and were printed as line 
drawings, 

This unprecedented and still unequaled journalistic scoop itself attracted enough attention so 
that the editors were moved to explain in detail how the formulas crossed the Atlantic Ocean over 
ordinary telegraph cables* since cable codes are equipped only for the transaction of human 

affairs in ordinary arrangements of letters and numbers , . , and not for . . . complex arrangements 
of Greek* Roman* and Gothic letters used in mathematical formulas,” 

The problem, stated so well in 1929* has remained unsolved for over 40 years. Certain 
abstract journals still spell out Greek characters and reduce mathematical formulas to a linear 
notation. Only in the last two years have there become generally available on the market* machines 
capable of generating and transmitting a code structure that can handle scientific text in its full-blown 
glory (to borrow a phrase from Dr, Garvin), In Figure 5 we see an excerpt from the Einstein paper as 
it appeared in the Herald Tribune and as it would be keyboarded on* and transmitted by a Model 37 
Teletype today. The transmission can be to another Teletype device or directly to a computer, We 



3. Keyboarding on a Scripting Typewriter 




have pieces of this text stored on the computer at Dartmouth College and can retrieve them at will. 
The next time 1 have an opportunity to retell this story, I should be able to show how this portion of 
the Einstein manuscript ! >cks when listed on the high speed printer, developed at the Bureau of 
Standards, that Dr. Garvin alluded to, and how it looks after it is phototypcset. 

We now have in process two major publications - an article for a mathematical journal and a 
book on statistical designs - which have served as a test bed for' one of our newer systems for 
automated publication. In this system material is prepared on the Model 37 Teletype which has 
forward and reverse half-line indexing for subscripts and superscripts; can type 126 different 
characters (including the Greek alphabet), and punches a paper tape consonant with the typed copy. 
After all corrections have been made in the paper tape, it is converted to magnetic tape from whence 
it is run on the computer into the GPSDIC system which produces a computer output on the 
extended character printer (GPSDIC train). This computer printout contains sufficient information to 
serve in place of a conventional galley proof. Errors that are discovered at this stage can be corrected 
in the batch mode. When the galley is deemed satisfactory, the material is run through a number of 
programs developed by Mrs, Carla Messina for justification (without hyphenations) and for 
processing to produce a magnetic tape ready to mount directly on the Linotron 1010 at the GPO. 

Experience gained with these pilot publications has confirmed our basic preference for 
on-line keyboarding over paper tape operations and especially for on-line editing instead of off-line 
paper tape correction followed by batch mode editing. 

The availability of “scripting’ 1 teletypewriter devices interfacing with suitable teleprocessing 
computers will, I believe, be recognized soon to offer solutions to many text-processing problems that 
have heretofore been characterized in the literature as “unsolved”. 

These devices for which a few determined pioneers have been waiting nearly 10 years should 
have an important impact on computer usage beyond text processing. They make it possible to enter 
mathematical problems into the computer in natural mathematical notation for direct computations 
along lines that have been spelled out clearly in the literature since 1963. I refer to the work of M. B. 
Wells at the Los Alamos Scientific Laboratory f 3,4] , H. J. Gawlik at the Royal Armament Research 
and Development Establishment [5] , and M. K.lerer at the Hudson Laboratories [6,7] . 

In these systems if is often sufficient to feed the computer the statement of the problem rather 
than its solution. In Figure 6 we see an example of a problem stated in terms of words and symbols 
natural to the discipline in which the problem arises. In the MIRFAC system that problem statement 
is all that the computer requires to obtain the solution. This is not an isolated instance. The system 
handles problems of much greater mathematical complexity with equal facility. Klerer and May have 
written a compiler which is uniquely suited to the solution of mathematical problems involving 
complex display formulas. In Figure 7 we see again a computer program which is simply the 
statement of a clearly defined mathematical problem. 

Now that suitable input devices are available at a reasonable cost, we would hope to see such 
compilers implemented on more ubiquitous computers, so that the time we engage in “computing 
without programming” will exceed the time we spend in “programming without computing.” 
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4, Scientific Text on 44 Keys 



Since our efforts to automate book production at NBS started when the Model 37 Tel ype 
was in its early development, we settled for an input device which though it was more primitive in its 
character set (88 chat actors) and had no scripting capability, could, however, be connected to a 
commercial on-line text-editing service. We considered, that the advantage of an economical on-line 
text-editing system outweighed the recognized “disadvantage 1 ’ of using a 44 key typewriter, and 
devised a simple and now clearly viable keyboarding convention for handling scientific text without 
compromising the notation. The use of an existing software-hardware combination (IBM’s ATS) 
allowed us to turn our full attention to the design and implementation of a comprehensive software 
package as an interface between the archival tape produced by the ATS system and the typography 
programs at the QPQ. As the original motivation for our mechanization was the preparation of a 
large abstract publication, the software system builds author and keyword indexes using the 
typesetting flags to identify items to be indexed. 

An example of the notational complexity which is achieved routinely by the system, which for 
the lack of a better name we have dubbed the 44 key system , is shown in Figure 8. The typographic 
information is conveyed in this system in two ways, The systematic use of boldface for titles and 
volume numbers and of italics for journal abbreviations, and for the variety of indentions is 
controlled by preceeding each of these portions of the text by a different number of tabs Other 
typeface changes occurring within the title or the abstract are provided by a system of overstrikes. 
Thus, any character overstruck by a / turns on the italic face; an equal sign turns on the boldface, etc. 
A comma, and a double quote used in this fashion produce respectively subscripts and superscripts 
and a right parenthesis produces greek characters. Figures 9 and 10 afford a comparison of two 
keyboarding techniques. 

The typing convention for the 44 key system , which has been in productive use for nearly 
four years by the staff of the Computer Assisted Printing Section and a number of Technical 
Divisions, has produced approximately 5000 typeset pages of published output. The manner in which 
the system handles the notational complexity of NBS manuscripts, coupled with the advantage of 
on-line operation account for much of the acceptance that the system has gained at NBS. For those 
groups at NBS who do not share our preference for on-line keyboarding and editing, the Computer 
Assisted Printing Section is equipped to convert MTST (Magnetic Tape Selectric Typewriter) 
cartridges to computer readable magnetic tape to feed into our edit insertion programs. The 
keyboarding convention is the same for both the on-line use of ATS and the off-line use of MTST 
machines. 

Successful as the 44 key system has been for abstract bulletins and conference proceedings, it 
is not a system we would recommend to individual authors for preparation of manuscripts involving 
chemical and mathematical expressions. For that purpose a scripting typewriter with a fuller 
character set, provides the author with a manuscript fully as legible as we have all become 
accustomed to seeing come from the hands of a capable typist on a conventional typewriter 
augmented perhaps by special keys, or more recently on a typewriter with interchangable type 
spheres, 
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Such copy Is in fact produced on the Model 37 Teletype. Unfortunately, currently available 
ATS text editing services do not accept input from ASCII coded terminals. Until we have the ability 
to connect a Model 37 Teletype to an economically viable text editing system, we must cope with 
paper tape corrections off-line. When we are able to do on-line editing from a Model 37 as easily as 
we now can with a 2741 terminal, we will be able to achieve manuscript automation literally at the 
authors desk. Such automation at the source is now technically feasible at NBS and shows promise of 
substantial savings in time and money. 
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4 , A P° r,ion of a S P £ " ;r °scopic taWe from information supplied on ordinary punched cards. Note how 
differently the lines containing pure numerics are treated from those that contain, mixtures of letters and num- 
bers. This specialized treatment was handled by a general-purpose program called SF.TAB. 





Figure 5. A portion of the Einstein paper on a Unified Field Theory printed in the Herald Tribune on February 1, 
1929. Today such technical text can be captured in machine readable form on a “scripting 1 * teleprinter in the 
manner shown at the right.. Such a single keyboarding can serve three purposes: transmission over' communication 
circuits., storage^ in a computer, and input to a phototypesetting system. 



begin 
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read a , b , and y front tape 




2 


r=/ 0 y exp (~a 2 x 2 ) cosbxdx 


•> y to 8 


3 


print r to 3 figs, a to 8 b to i 



• *1^ mi dfaT language which serves ?*s a complete program. Note 

to integrate accurately, 







Figure 7. Improbable as it may seem, the above is a complete program which the Klerer-May compiler accepts and 
returns the value of the integral. The typing, of this material is not much more difficult than typing the 
corresponding expression in a manuscript. 



March- April 1968 



Constant pressure flame calorimetry with fluorine, II, The heat of 
formation of oxygen difluoride, R. C, King and G, T. Arm- 
strong, J. Res, Nat. Bur, Stand, (U.S.), 72A (Rhys, and 
Chem,), No, 2, I 13-131 (Mar.-Apr. 1968). 

Key words: Bond energy (O — F); flame calorimetry; flow 
calorimetry; fluorine; heat of formation; heat of reaction; 
hydrogen fluoride (aqueous); oxygen; oxygen difluoride- 
reaction calorimetry; water. 

The heats of the following reactions were measured directly in 
an electrically calibrated flame calorimeter operated at one atm 
pressure and 303 °K. 

OF a (g) + 2H 2 (g) + 99H 2 0(1)— > 2[HF • 50H 2 O](I) 

F 2 (g) + H 2 (g) + 100H 2 O(l)— » 2[HF • 50H a O](l) 

V 2 0 2 (g) + Hatg)-* H 2 0(1) 

The reactants and products were analyzed for each of the reac- 
tions. From these heats we calculated the corresponding heats of 
formation, as follows: 

OF 2 (g)A//; 29815 = +24.52 ± 1.59 kJ moM (+5.86 + 0.38 kcal 

mol" 1 ) 

HF • 50H 2 O(l)A//° 298 15 = —320.83 ± 0.38 kJ mok 1 (-76.68 + 

0.09 kcal mol” 1 ) 

H 2 0(l)A//® 298 J5 = —285.85 ± 0.33 kJ mol' 1 (-68.37 + 0.08 

kcal moM) 



Figure 8. Sample entry from NBS Spec, Pub, 305-1 keyboarded on a 44 key terminal on-line to a time-shared text 
editing system. See Figure 9 for the keyboarding convention used in this work. 





March- Apr i 1 | 9 6 3 

Constant pressure flame calorimetry with fluorine. II. 

The heat of formation of oxygen d i fluoride , R, C. King 
and fi, T. Armstrong, ,1. Res. Nat. Bur. Stand. (U.S.), 

P2A {Phys. and Chem. J , No. 2, ] ] 3- ] 3 J (Mar. -Apr. ]968), 

Key words: Bond energy (0__F) : flame calorimetry; 

flow calorimetry; fluorine; heat of formation; heat of 
reaction; hydrogen fluorine (aqueous); oxygen; oxygen 
di fluoride ; reaction calorimetry; water. 

The heats of the following reactions were measured 
directly in an electrically calibrated flame calorimeter 
operated at one atm pressure and 303 °K. 

0F?tg) * 211?tg) + 9911?0(1) ? 2#i!F 'j S01I?6@(1) 
F?tg) + U?Tg) ]OOH?0(1) f 2#HP 5 3Olt?00(l) 
Tfjejt g) * H?tg) y n?e(i) 

The reactants and products were analyzed 
for each of the reactions. From these heats we calculated 
the corresponding heats of formation, as follows: 

QFZTc) ®fl° £298.15 8 +24.52 ) 1.59 kj mol*] t + 5.86 
) 6.38 kcal mol*]? 

HF 5 SOHJ0 (1) I?R e f 29 8 . ] 5 8 _320.83 ) 9,38 kj 
mol*] t_76 , 68 ) 0.09 kcal mol w ]? 

fi?0(l)^fi°f298. ] 5 » 285.85 ) 6.33 kj mol*] 

t_68.32 ) 0.08 kcal mol"]? 

The uncertainties indicated are the estimates 
of the overall experimental errors. The value of the 
average O F bond energy in OF? Ras calculated to be ]9],29 
kj mol"] 145,72 kcal mol"]?. 



Figure 9. Sample entry as keyboarded on an IBM 2741 into an on-line text editing service. Note the use of 
overstrikes to obtain grid changes, subscripts, and superscripts. This system is in daily use for the production of 
NBS Spec. Pub. 305 and its supplements and numerous conference proceedings. The next figure shows the same 
material keyboarded on a scripting typewriter with 126 printable characters. 
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March -Apr i 1 1968 



Constant pressure flame calorimetryjwi th fluorine. II. Jft le 
neat of formation of oxygen dif luoride ,nR. C. King and G. T. 
Armstrong, iJ, Res. Nat. Bur. Stand. (U*S R ) , b72An(Phys . and 
Cheg.), Nojj 2 , 113-131 (Mar. -Apr. 1968), (3j 

Key words: Bond energy ( 0-F ) ; flame calorimetry;, flow 
calorimetry; fluorine; neat of formation; neat of reaction; 
n/drogen fluoride (aqueous); oxygen; oxygen difluorida; reaction 
calorimetry; water* 

fne heats of tne following reactions were measured directly 
in an electrically calibrated flame calorimeter operated at one 
atm pressure and 303 K, 



* 2H 2 (g) + 99H 2 0( 50ri 2 O] ( 1) 

F 2 (g) + H 2 (g) * 1 Q0H 2 O( 1 [rfE^gjj) 50 h 2 o] ( 1 ) 
0 2 (g) + H 2 (g3^Yn^ 2 0( 1) 



reactions. 



The reactants and products were ana lye d for each of the 
From these heats we calculated the corresponding heats 



of formation, as follows: 



i - 1 \ 

mol ) 



290, 1 5 m +24 * 52^Tn) • 59 KJ mol^ 1 ( +5 *8€^g7n)o. 38 kca 1 



1 )&d ^ 288, 1 5 s ^320 . SJmnJO , 33 KJ mol” 1 ( -7 6. bd^Tn) 



0,03 Kcal mo 1 ~ 

j H 2 °(l )^ ri ”f29a.1 5 35 ^285,85^^0.3 3 KJ mol’ 1 ( ^bd . 3 2(y7n)j . 08 



Tne uncertainties indicated are the estimates of the 
overall experimental errors. The value of n the average 0-F bond energy 
in QF 2 was calculated to be 191,23 Kj mol 1 (45.72 kcal mol’ T )„ 



Figure 10* Sample entry as keyboarded on a Model 37 Teletype, On this terminal Greek characters as well as 
subscripts and superscripts appear in natural form for easy proofreading. The numbered and circled symbols n, i, b, 
s, and g are keyboarded in red. They signal changes respectively to the following grids: Roman (normal). Italics, 
Bold, Symbol, and Greek. The indentations in the copy are achieved with multiple tabs which control the 
systematic type face changes and other formal characteristics of the typeset copy shown in Figure 8. 




COMPUTER USAGE IN A LARGE DATA CENTER 



James I. Vette 

National Space Science Data Center 
Goddard Space Flight Center 
Greenbelt, Maryland 



I. Introduction 

In order to present the various ways in which computers are used at the National Space 
Science Data Center (NSSDC), it will be necessary to give a brief description of the total activity of 
the Center, (More detailed information can be obtained from other documents, 1 *" 6 ). In that way, one 
can see what we mean by a large data center. Pm sure that there are some in the audience that are 
associated with larger facilities but compared to the general lAC’s identified in the COSAT 1 
Directory, NSSDC represents a large data center. 

NSSDC Is responsible for the acquisition, organization, storage, retrieval, announcement and 
dissemination of the scientific data obtained primarily by satellites. To a lesser extent, we are 
involved with the results from experiments carried out on sounding rockets, probes, high-altitude 
aircraft and balloons. The size of the data base involved is given in Table 1. It can be seen that data 
are stored on magnetic tapes, punched cards, microfilm, photographic films, and prints, as well as 
hard copy. 

One of the main functions of NSSDC is to provide data and information to qualified users and 
to refer others to appropriate sources for the services they seek. Our user community are generally 
scientists, engineers, college level teachers, and students who wish to use the data in some scientific 
investigation or for some instructional purposes. The casual seeker of knowledge about the space 
program and its scientific results is inferred to the appropriate sources for public information. A 
measure of the activity of the reproduction of data and data products are given in Table 2. We will 
only be concerned in this talk with that reproduction where computers are.utilized. 

In addition to serving as a data and information center, NSSDC also performs as an IAC in 
analyzing and synthesizing some of the vast quantities of data in its archive so that new and useful 
forms of the data are available. In this analysis work the computer is used extensively. 



2. Availability of Computer Systems 

Before discussing in some detail the specific uses of computers in performing the functions of 
NSSDC, the various computers readily available to the Center will be listed. There are ten large 
general purpose computers and numerous smaller ones at Goddard Space Flight Center (GSFC) 
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VOLUME OF DATA AT NSSDC (T2/31/70) 



FORM 


CUMULATIVE 


Sheets and Bound Volumes, Sheets 


166,724 


Digital Magnetic Tapes, 1/2 inch x 2400 feet 


11,328 


Microfilm, 100-foot Rolls 


16,177 


Photographic Films; 


9-1/2-inch width, linear feet 


18,000 


70-mm width, linear feet 


310,482 


16-mm width, linear feet 


8,840 


35-mm width, linear feet 


759,769 


4x5 inch, each 


9,186 


8x10 inch, each 


2,410 


16 x 20 inch, each 


93 


20 x 24 inch, each 


8,005 


Photographic Prints: 


9-1/2-inch width, linear feet 


9,000 


70-mm width, linear feet 


22,000 


8x10 inch 


7,035 


11 x 14 inch 


500 


16 x 20 inch 


93 


20 x 24 inch 


3,200 


Punched Cards 


37,700 









aeo 



1970 NSSDC REQUEST OUTPUT 







NUMBER OF 








REQUESTS 


TOTAL AMOUNT 


MEDIUM 


UNIT 


COMPLETED 


OF OUTPUT 


Digital Magnetic Tapes 


2400' Reels 


123 


655 


Punched Cards 


Cards 


65 


77936 


Computer Printout 


Sheets 


223 


64700 


Microfilm 


Reels 


202 


2520 


Hard Copy 


Pages 


356 


62276 


Photo 




181 




LUNAR ORBITER 






Positives or 
Negatives 
Black 8t White or 


Each/ Feet 




4352/2584 


5778 


Color Prints 


Each 




35 mm x 100 feet 


Reels 




114 


SURVEYOR 




16 




Positives or 








Negatives 


Each 




75 


Black & White or 






81 


Color Prints 


Each 




GEMINI 




47 




Positives or 






21 


Negatives 
Black & White or 


Each 




97 


Color Prints 


Each 




NIMBUS 




178 




Positives or 
Negatives 

Black & White or 


Each/Feet 




729/7445 


Color Prints 


Each 




2078 


MARINERS 6 and 7 




39 




Positives or 
Negatives 
Black & White or 


Each/ Feet 




153/4050 


5533 


Color Prints 


Each 




35 mm x 1 00 feet 


Reels 




5 


APOLLO 




239 




Positives or 
Negatives 
Black & White or 


Each/ Feet 




496/22278 

6487 


Color Prints 


Each 




35 mm x 100 feet 


Reels 




58 
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COMPUTER PRODUCTION-1970 









Computer 
Time (Hours) 


A. 


Processing 


IBM 360/75 


12 






IBM 7094 


519 


B, 


Requests 


IBM 360/75 


30 






IBM 7094 


379 


C. 


Information System 


IBM 7094 


645 


D. 


Analysis 


IBM 360/91 


5 






IBM 360/75 


18 






IBM 7094 


92 



Man-Years 

4 

3 



3 



3 



PROGRAM DEVELOPMENT-1970 



A. Processing 


IBM 360/75 


Computer 
Time (Hours) 

2 




IBM 7094 


101 


B. Requests 


IBM 360/91 


1 




IBM 360/75 


3 




IBM 7094 


42 


C. Information System 


IBM 360/75 


17 




IBM 7094 


493 


D. Analysis 


IBM 360/91 


2 




IBM 360/75 


3 




IBM 7094 


13 



Man-Years 
3 1/2 



1 1/2 
8 



2 



3 
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where NSSDC is located. Four of the large computers are used by the Data Center for various tasks. 
An IBM 7094 MOD II running in the conventional batch processing mode is located in the Data 
Center building and is heavily utilized. An IBM 360/91 and a 360/75 both operating in a 
multiprogramming, variable task mode (MVT) are also used to a lesser extent. There are terminals at 
the Data Center which allow for remote job entry to these two computers. In addition an interactive 
system employing APL (A Programming Language) is available through a typewriter terminal to an 
IBM 360/95, which is the largest computer at GSFC. In addition plots and microfilm outputs are 
available through an S-C 4020 and S-D 4060. There are Cathode Ray Tube (CRT) Terminals with 
light pens (IBM 2250’s) available for the 360 computers on a limited basis for special development 
work. 



3,0 Computer Usage 

We will discuss the computer usage in four functional categories: (a) processing of data into 
the archive, (b) responding to requests for machine sensible data, (c) storing and retrieving 
information from the information system, and (d) analyzing data. For the calendar year 1970 we 
show in Table 3 computer time used both in computer program development and in the production 
running of these programs on the various computers. In addition approximately 1000 terminal hours 
were logged on APL for analysis of data. The approximate amount of effort in man-years is also 
given in Table 3 for each category. For program development the effort is for computer 
programming and for production work this represents tape handling, job submission, setting up the 
various computer runs, and handling the resultant outputs. This latter effort does not include the 
operation or maintenance of the computer facility since this is not the responsibility of the Data 
Center. 

We will now discuss in more detail the type of wprk accomplished through the use of the 
computer in these four categories. 





3.1 Processing 



The data received in machine sensible form are nearly always on digital magnetic tape although 
occasionally punched cards are used. Analog tapes and punched paper tape are practically never used 
as storage media for the type of data NSSDC archives; consequently the necessary equipment to 
handle these is not available at NSSDC. Although the raw data from the satellites are collected by 
various tracking networks operated by NASA, USAF, foreign countries, and ESRO, the reduced and 
analyzed data which NSSDC uses are collected directly from the principal investigators in charge of 
the individual experiments and responsible for the first or prime analysis of the data. Consequently 
the data has been processed by a wide variety of digital computers and the magnetic tapes we receive 
are coded in forms appropriate for these computers. Unfortunately there is a great degree of 
incompability between the various computers in terms of the character size (6 or 8 bits), parity (odd 
or even), word size (16,24,36,48,60 bits), as well as the specific meaning of a string of bits in terms 
of a number or a letter. For those not so familiar with computer jargon it is really analogous to 
different cultural languages (not to be confused with programming languages such as COBOL, 
FORTRAN, etc.). 



In order to handle this problem it is necessary to translate from one computer language to 
another. Fortunately the problem is really one of transliteration since there is no syntax involved. 
The processing of the incoming data is used to accomplish the following functions: (a) verify that all 
the data are readable from the tape, (b) verify that the format of the data has been correctly specified 
and documented by the sender, (c) make an index or catalog of each tape and (d) produce a new self 



documented tape in the “language” of our local computer which includes the index and format 
information. A generalized Data Base Management System has been under development to perform 
these tasks in a straight forward manner. The heart of this system is really a problem oriented 
language which allows our people to specify easily jw much data is to be retained from the original 
tape and wha the organization of the data will be on the new standard tape. 1 he system also includes 
a set of programs which can operate on the standard tape to provide various checks, produce 
specified plots or printouts, do statistical analyses and produce the necessary index for each tape. In 
addition the system contains the necessary programs to produce an output tape fron he standard 
tape in any of the common computer “languages” so that the user who has requested specific data will 
have no trouble in entering this directly into his own computer without having to perform the 
transliteration process. 



3.2 Requests 

The processing of the machine sensible data described in the preceding section has prepared 
this data so that it can readily be retrieved in part or whole and outputted in a variety of forms for the 
greatest convenience and ease of use by the requester. These outputs include computer printouts of 
the data in various tabular formats, plots of selected data, and a magnetic tape that is compatible with 



the requester’s computer. In addition there are some programs which convert the data on tape to 
specialized outputs which are extremely useful. An example of one of these outputs is shown in 
Figure 1 . This is known as a grid print Mercator map which provides a black body temperature as a 
function of position from the earth viewing infrared radiometers on board the Nimbus Meteorologi- 
cal Research Satellites, These maps can also be produced in stereographic projection about any point 
on the earth’s surface in scales of 1 - 10, 20, or 30 Million, This particular map shows Typhoon 
Marie, a storm in 1966 which was observed by the Nimbus II Satellite. In addition a data popu- 
lation map can be produced which gives the number of data points used to determine the average 
temperature of the grid print map. Occasionally special computer programs are written to select 
data on th^ basis of criteria specified by the user or to perform certain averaging of the data. 
However, most requests are for data covering specific time intervals. One program was written to 
determine when certain satellites would intersect specified field lines of the earth’s magnetic field. 

3*3 Information System 

In order to keep track of the numerous supporting information that is necessary to supply to 
users along with the scientific data itself a number of computerized files are used. These files 
constitute the major part of the total information system of NSSDC. A whole range of reports can be 
printed out periodically from these files. In addition specialized inquiries can be made with the 
coding of simple computer programs. 

The Automated Internal Management File (AIM) is used to store information about the sat- 
ellites, experiments and data sets. There are about 50 different items connected with each of the three 
types of entries. As of December 31, 1970 this file accounted for 1380 satellites, 1824 experiments 
and 575 data sets. This file is used to produce th c catalog of the Data Center’s holdings as well as 
brief descriptions of various satellites and their experiments which appear in published compilations 
from time to time. In addition management information about the file and the status of acquiring and 
processing the data are available, 

A second file is the Technical Reference File (TRF) in which information about all the 
documents (published and unpublished) concerning the satellites, experiments, rocket and balloon 
flights and appropriate aircraft flights is kept. Besides the author, title, and bibliographic notation, an 
internal classification of the document and location is produced. Keywords are assigned by our staff 
of space science professionals to relate the article to the appropriate satellite, experiment, specific 
disciplines, geophysical events, and other items of interest to our users. This Is an extension, for a 
small subset of documents, to the extensive indexing, keywording, abstracting, storing and retrieving 
of the aerospace literature performed by NASA through its Scientific and Technical Information 
Facility (STIF) and through the AIAA, The TRF is used to produce various types of bibliographies 
as well as provide management information about the scientific output of various investigators and 
satellite missions. 




There are several other computerized files which will only be mentioned briefly here. The 
bookkeeping connected with our request business is maintained in a computerized file called Request 
Status and History (RASH). In addition there is a ROCKET file for keeping track of all the rockets 
launched throughout the world carrying space science experiments. A distribution file is used to 
maintain the names and addresses of people in various categories and one output of this file is printed 
gum labels for mailing purposes, A Data Set Inventory System that is used to keep track of all our 
data products, their location and status is in the process of being completed. In addition an 
Extraterrestrial Photographic Information Center File is used to supply supporting information about 
our photographs including descriptors about some of the subject matter contained in the pictures. 

None of the information system files described is extremely large. The total number of 
characters in some of these are given in Figure 2 and the number of transactions per month for AIM 
is given in Figure 3* The importance of the information system to the operation of the Data Center 
can be judged from Table 1 where one sees that more computer time and program development have 
been used than any other area. However, during the present year the processing category will require 
the maximum computer usage as we begin to process the large quantities of data now coming in. 



We are in the process of putting a portion of our information files in an on-line terminal 
operated system that is comoierlcaliy available. We hope to determine from Inis experiment our full 
requirements for an on-line system and to measure the change in efficiency of our operation using 
this service . 



3.4 Analysis 

Most of the data collected by NSSDC cannot be understood directly in terms of simple 
physical processes since there are many competing physical phenomena occurring during the 
measurements. The analyses conducted by the Data Center emphasize the selection of data from a 
large number of experiments in order to produce tractable models of the various environmental 
conditions that exist in space. In many cases these models are strictly empirical; in other cases 
theoretical ideas provide parameters which can be determined from the data. In one sense, such 
models represent data compression and a theory which explains fairly completely a given class of 
observations represents the maximum compression. The results of such syntheses are data products 
generally useful to a broader class of users than those capable of working with the basic observational 
data. 

Since this analysis work involves lengthy computations as well as the handling and display of 
large amounts of data, computers are used extensively in this work. Optimization of instrument 
parameters, transformation of coordinate systems, transformation of physical quantities, non-linear 
regression analysis, correlation of selected physical quantities, time series analysis, orbit computa^ 
tions, and graphical displays are the main functions accomplished by computers in carrying out these 
tasks, ^ 

Several specific examples will be given. Energetic protons are trapped or contained in the 
earth’s magnetic field. In order to obtain a fairly complete mapping of these particles with energies 
above 50 Million electron volts (MeV), 21 different experiments were studied. Some synthesized 
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results are shown in Figure 4. The “best value” particle flux contours are given in a magnetic 
coordinate system which takes the real shape of the earth's magnetic field into account. More details 
on the trapped radiation model environment are given in references 7-12. 

Another example is shown in Figure 5 where the orbits of several different satellites, which 
were operating at the same time, are displayed in a coordinate system (solar-ecliptic) in which certain 
experimentally identifiable boundaries, the bow shock and the magnetopause, remain stationary. 
Although some of this work is reflected by the computer usage shown in Table 1, much of it is 
accomplished using the interactive APL terminal. One of our scientists has used this system to 
determine on-line the optimum values to assign to any given broad band X=ray detector 13 . The 
parameters specifying the detector are inputted, the forms of the X-ray spectrum can be chosen, and 
the answer is returned immediately. Different parameters and spectral forms can be used to cover all 
possible situations. 

In addition some efforts are underway to improve the general data manipulation and display 
problems we have through the use of problem oriented languages with the existing computer tools 
available to us at the present time. As one can see the computer plays a vital role in our Data Center. 
We are looking forward to the time when new high density storage devices and interactive time 
shared computers with graphic displays can be used to give us new capability in the storage, retrieval, 
display, manipulation, and analysis of our growing data base. 
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Figure Captions 



Figure 1 Grid print map of Typhoon Marie. The black body temperature derived from the High 
Resolution Infrared Radiometer on the Nimbus II satellite is printed in degrees Kelvin as a 
function of geographic position using a Mercator projection. 

Figure 2 The size of various information files at NSSDC. 

Figure 3 The number of monthly transactions for the Automated Internal Management [AIM] 
File. A transaction is either a new entry, a correction, or a deletion to the file. 

Figure 4 A B=L flux map for protons. The contours are for omnidirectional flux; this flux gives the 
number of protons above 50 Mev that would enter a sphere of cross sectional area of one 
squ arecenti meter in one second. Hie B coordinate is the intensity of the earth’s magnetic field and 
the L parameter is a quantity that labels a given field line. The L parameter can be interpreted as 
the distance from the center of the earth that the field line crosses the geomagnetic equator. This 
B,L coordinate system is used extensively in the study of the earth’s radiation belts. 

Figure 5 Orbits of satellites Vela 6A, Vela 5A, and HEOS-A1. The orbital paths of each satellite 
during the period April 8-1 2* 1970 are projected onto the X=Y plane of the solar=ecliptic 
coordinate system. The distance units are earth-radii. In that system the earth is at the origin* the 
X axis points toward the sun, and the X-Y plane is the ecliptic plane. The magneto sheath is the 
region between the bow shack, shown by the short-dash line, and magnetopause, shown by the 
long-dash line. These two boundaries are caused by the interaction of low energy protons from the 
sun [the solar wind] and the earth’s magnetic field. 
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Figure 1 Grid print map of Typhoon Marie, The black body temperature derived from the High Resolution 
Infrared Radiometer on the Nimbus II satellite is printed in degree' Kelvin as a function of geographic position 
using a Mercator projection. 
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Figure 3 The number of monthly transactions for the Automated Interna! Management (AIM) File.. A transaction is either a new entry, 
correction, or a deletion to the hie.. 
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Figure 4 A B-L flux map for protons. The contours are for omnidirectional flux; this flux gives the number of 
protons above 50 Mev that would enter a sphere of cross sectional area of one squareccntimeter in one second. The 
B coordinate is the intensity of the earth’s magnetic field and the L parameter is a quantity that labels a given fieT 
line. The L parameter can be interpreted as the distance from the center of the earth that the field line crosses the 
geomagnetic equator. This B,L coordinate system is used extensively in the study of the earth’s radiation belts. 
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Figure 5 Orbits of satellites Vela 6A, Vela 5 A, and HE0S-A1. The orbital paths of each satellite during the period April 8-12, 1970 are 
projected onto the X-Y plane of the solar-ecliptic coordinate system. The distance units are earth-radii. In. that system the earth is at the origin, 
the X axis points toward the sun, and the X-Y plane is the ecliptic plane. The magneto sheath is the region between the bow shack, shown by the 
short-dash line, and magnetopause, shown by the long-dash line. These two boundaries are caused by the interaction of low energy protons from 
the sun (the solar wind) and the earth's magnetic field... 



THE NATIONAL STAKfi IN BETTER TECHNICAL INFORMATION 



James H. Wakeiin, Jr,, 

Assistant Secretary of Commerce for Science and Technology 

It has been eight years since publication of the bible of the information analysis profession, 
the Weinberg Report, entitled Science , Government , and Information . 



Since then, we have made progress in all three elements of that title— in science, in 
Government, and in information— but our progress has been halting. 

First, science and technology are no longer the glamor fields they were then. In fact, I am 
very much troubled by the attitudes of anti -science and anti -technology we see so frequently. 



Second, Government is admittedly poorly organized to cope with society’s demands upon it. 
This is particularly true of Government’s agencies concerned with science and technology. That is 
why President Nixon recently proposed to the Congress to reorganize the Executive Branch around 
the great contemporary purposes of government. His plan would bring government closer to the 
people, simplify program coordination and conflict resolution, and permit clearer assignment of 
authority and accountability. 

Third, information multiplies and accumulates too rapidly for our over-burdened scientists 
and engineers to process It- — or for our antiquated governmental institutions to use it efficiently, 
Data-processi ng technology and data-produeing organizations have outpaced the supply of human 
beings qualified and organized to handle it. 

Despite our massive social problems— of education, environment, cities, transportation, 
housing, and the like — I remain optimistic that we can solve them, using information to do so. I 
believe that there is a latent respect among the American people for science and technology. Society 
can achieve its loftiest ambitions, but it requires these tools to do so. Anti-science and 
anti-technology attitudes can be made to yield to persuasion, because, in the end, science and 
technology are essential to the achievement of society’s goals. We need a massive infusion of 
confidence, which I believe can come from enlightened young people. Perhaps some of the 
commencement speakers who are just around the corner will tell us how. Consider the remarks which 
follow to be my commencement speech to you. 

The Changing Role of the Information Analysis Center 

The publication of the Weinberg Report in 1963 was a milestone. While laboratories 
performing some of the information analysis functions had existed for a century or more, true lACs 
were proliferating at that time. 

Information Analysis Centers — lACs— not only have proliferated in the last decade, they 
have subtly changed, Alvin Weinberg emphasized transfer of information as an inseparable part of 
research and development, in these words: 



All those concerned with research and development — individual scientists and engineers, 
industrial and academic research establishments, technical societies* government agencies — 
must accept responsibility for the transfer of information in the same degree and spirit that 
they accept responsibility for research and development itself. 

Transfer of information is, of course , an integral part of the R&D process, yet there is more 
to the information analysis function than that. In forwarding the invitation from Ed Brady to make 
these remarks tonight* Lew Branscomb put a different and significant focus on your profession when 
he wrote: 

I consider that the major “information problem” the nation faces is not information 
manipulation or transmission but the quality of available information and its interpretation 
for appropriate use. 

In those two quotes we see symbolized the changes of the past eight years: You long ago went 
beyond the stage which might be called the transfer of information to a new level of concern over the 
quality of information. AH scientists and engineers are involved in the transfer of information. But 
the lACs are both “meecas” and “mechanisms,” They are the meccas for comprehensive* quality 
information and its interpretation. And they are the best mechanisms we have for feedback, 
completing the loop to assure appropriate information appropriately used. 

Relevant information has always been a sine qua non for any expert. But judging the 
relevance of information other than that which he generates is always difficult. The researcher can 
place confidence in the IAC for several reasons. The principal reason is that the operation of lACs 
usually takes place in a research atmosphere. This permits top-level experts to take part in analysis 
activities, while continuing to do research. It provides means for checking crucial values* for 
confirming experimental techniques, or for assessing purity of materials. It brings together experts in 
related fields. 

I can see only one possible disadvantage in this research atmosphere. If it is too “ivory 
tower,” that makes it easy for IAC staff members to consider that the specialists in their field are the 
p/i mary users of their information. That would be an error* or at least an over -simplification. It is the 
nonexpert * or the expert from another field, who has the greatest need for IAC output and services. 



The greatest benefit leverage-factor for IAC services is found when the services are used in 
direct application to practical problem-solving. It is no derogation of basic research to state that the 
benefits achieved for society are realized much faster through problem-solving than through basic 
research. 

The role of the [ACs is constantly changing. The lACs must plan to provide services and 
output which can be used by engineers and applied scientists as well as by basic research workers. 
You must provide more and more outputs which will directly contribute to solving major problems 
relevant to national needs. Some IACs arq best operated by Government, some by universities, some 
by research institutes* and some by professional societies, I was happy to note just recently that the 
American Nuclear Society is establishing an Information Center on Nuclear Standards. 



Six Major Needs in information Analysts 



In his speech to this forum three and a half years ago, Donald Hornig, now president of 
Brown University, spoke of “the responsibility of the information analysis center to try to ensure that 
significant information from all sources is incorporated into the body of related information stored in 
the center.” By the word significant he implied that the I AC staff must use judgment to decide what 
information is worth storing for future use and ingenuity to try to locate all information that is worth 
accumulating. 

But this responsibility must not be passed entirely to the lACs. It must be shared by active 
workers in each field. In compiling my list of six major needs in information analysis, therefore, I 
borrow the first from Don Hornig: 

1 . We need to involve a much larger proportion of the total technical community in 
information analysis activities — as users, as participants, or just as supporters, for we are all 
potential users and participants. In each of those roles, we should promote the concept of the LACs 
through word-of-mouth advertising. Of course, you are prmarily interested in your professional field 
and the constituency your center serves. But why not take every opportunity to inform both your 
professional colleagues and your customers (so to speak) that yours is an information analysis center, 
They might call on you with unrelated problems if they knew that an IAC is an organization with a 
unique capability for acquiring, selecting, storing, retrieving, evaluating, analyzing, and synthesizi? g 

a body of information in any clearly defined specialized field, perhaps in theirs, 

2, We need some new I A Car. 1 cannot believe that the present roster of Federally supported 
and other lACs covers all the areas of science, technology, and scholarly interests in which 
mechanisms of this sort could contribute to solving national problems. Lew Branscomb has suggested 
that NBS, which now operates several lACs in the Standard Reference Data program, shall probably 
establish others. These might help fulfill the Bureau’s responsibilities in fire research, environmental 
technology, building technology, and other areas. I can imagine unfilled needs in information for 
policy analysis at the highest levels of Government. If and when new lACs are established throughout 
the nation, in whatever institution, you who constitute the reservoir of knowledge on how to set up 
and run them should offer your full assistance to the newcomers. This is something I know you will 
do. 



3, We need a strengthened National Technical information Service to support iACs and to 
fill gaps including functions which are not properly those of iACs, NT IS, as many of you know, was 
established late last year to bring together many of the technical information functions of the 
Department of Commerce, It publishes Federal publications and data files and makes them available 
to the business, scientific, and technical communities. This is different from, but a supplement to, 
information analysis, and I do not see NTIS taking over any of the functions of your agencies or your 
centers. To use my earlier expression, I see it as a “mechanism,” not a “mecca,” I believe that NTIS 
can support all of you by providing certain services and products far more efficiently and 
economically than you could. Tomorrow afternoon Mr, Harry Pehly of Plastec will discuss such a 
proposition. 



Two other useful services of NTIS are the subscription and the standing order services. For 
example, for $10 per year you can subscribe to “Aerospace Medicine and Biology,” a continuing 
bibliography of studies on the biological, physiological, and environmental effects of space flights — a 
joint publication of NASA, the Library of Congress, and the American Institute of Aeronautics and 
Astronautics. For $22 per annum you can subscribe to “Air Pollution Abstracts,” published monthly 
by NTIS for the Air Pollution Control Office of the Environmental Protection Agency. And for the 
same price you can subscribe to the semi-monthly “Selected Water Resources Abstracts.” There are 
Asian serials, Communist China serials. Eastern European serials, USSR serials, and others covering 
translations of technical documents from throughout the world. You can receive, free, pamphlets 
about these and other NTIS sc. ices by visiting the information center in the lobby of the Commerce 
Building or by writing the National Technical Information Service, Springfield, Virginia 22151. 

NTIS and the lACs can be mutually supportive, and it is a challenge to both to develop 
mechanisms of support, I know that Mr. Knox and Dr. Brady are studying this matter very carefully. 



4. We need greater appreciation at the level of assistant secretary — not just in the 
Department of Commerce, but in ull departments — -of the rov and value of lACs. Every department 
has an Assistant Secretary for Research and Development, by that title or something close to it. He is 
a key constituent of yours who should be familiar with the range of services which IACs supply to his 
department, or which they should be supplying. Yet I dare say that, with the possible exceptions of 
Defense and AEC, not one of your lACs has ever been visited by an Assistant Secretary of a 
supporting Federal department. If you have, then the thanks go to Andy Aines and COSATI for the 
high-level support they have provided over the past several years. Support should work two ways, so 
I would make my next recommendation that: 

5. We need, at the l A C level, to support COS A 77, for much of what has been accomplished 
by IACs we owe to the leadership and coordination it provides to the field, I was Assistant Secretary 
of the Navy for R&D when COSATI was formed, so I have enjoyed a close view of it since its 
inception and I can’t praise Andy and the agency representatives on COSATI highly enough. 

6. We need strengthened international cooperation, for all of the areas in which you operate 
are international concerns of science, technology, or scholarship, I know that many of you already 
work closely with your counterparts in other nations, and I encourage you to intensify these efforts. 



t hree Problem Areas 

In approaching the end of my remarks, I would like to discuss briefly three problem areas 
with which 1 am concerned in my new responsibilities in the Department, of Commerce. Yet these 
are national problems, not my Department’s alone. Regarding each, let me ask you, “What could 
you contribute to the solution of this problem?” 

The first is: 

International voluntary standardization and certification On April 28 the Department of 
Commerce sent to the Congress an Administration bill designed to promote exports through 
strengthened international voluntary ‘standardization and certification activities. This bill would 



assign to the Secretary of Commerce the principal Federal responsibility for assuring that U. S. 
interests are adequately represented in this field. It also would authorize the Secretary to enter into 
grants or contracts with nonprofit organizations. By helping to write new internationally agreed-upon 
standards, we think that the U. S. will make sure that such standards reflect U. S. engineering 
practice. The legislation also will enable the Government to cooperate with U. S. industry in reaching 
international agreements. And it will create an official U. S. link in the international standards 
making process. I think this is an area in which an IAC, probably located at, and operated by, the 
National Bureau of Standards, will be essential. 



The second problem is: 

Environment, 1 am told by ecologists that the life sciences boast, throughout the world, 
52,000 journals. These are primary publications, which publish at least one, but in many cases two or 
more, articles on ecology per issue. To a great extent, this knowledge is being addressed to other 
specialists within the same discipline, and to the peer groups within those disciplines. It sek' m 
reaches across disciplinary boundaries, and even less frequently across national boundaries. I here is 
a time lag between the acquisition of new knowledge in ecology and its practical application to 
problems of environmental management. As we accelerate national and international programs of 
environmental management, this lag will impede the development of improved ways of anticipating, 
assessing, and solving the problems of environmental deterioration, I find this a second area in which 
one or more IACs will be essential. The nature and location I leave to your imagination, and to your 
future planning. 



Coastal Zone Management. As some of you may know, 1 have spent the past year serving 
the Honorable Russell W. Peterson, Governor of Delaware, as Chairman of the Governor’s Task 
Force on Marine and Coastal Affairs. Seven distinguished citizens of Delaware constitute this task 
force. In mid-February we presented to the Governor and the Legislature a preliminary report. 
During the next several months we will complete the final report, with recommendations on the major 
resources of the state including water management, fisheries, and wildlife; recreation including parks, 
boating, and sportfishing; and an extensive treatment of environmental quality including, but not 
limited to, waste disposal, pesticides, protection of the beaches and shoreline; and the problems 
created by mosquitoes and biting flies. The preliminary report was issued because of the urgency of 
certain decisions facing the state concerning the use of its coastal zone. One of its central 
recommendations was that a comprehensive baseline study of the principal water bodies of 
Delaware’s coastal zone be performed, in cooperation with New Jersey, Maryland, the Delaware 
River Basin Commission, and the Federal Government. 

What this suggests to me is that many information analysis centers are needed to provide 
scientists, engineers, and policy-makers with baseline data on all of our coastal zone problems which 
are quantifiable— and most of them are. Think of the opportunities for inter-disciplinary approaches! 
Such compilations and analyses will require a joining of scientific, engineering, sociological, eco- 
nomic, legislative, and communications skills. Such approaches usually wdl encompass a region of 
two or more states. They often will involve international cooperation. Examples are eutrophication 
and other problems of the Great Lakes, air pollution from the automobiles and factories of both 
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Detroit and Toronto, and the many marine problems of the Atlantic, Pacific, and Caribbean coasts. 
] find it fascinating to speculate about the possibilities for real public service when new or redirected 
information analysis centers focus their atiention on coastal zone or other environmental problems. 



^ sfc sfc 

I conclude most of my speeches, particularly one like this where so many of my new 
associates in the Department are present, with a little tribute to them, I was pleased to find, on my 
arrival in this position the first of March, literally hundreds of highly competent, highly motivated 
people, in the National Bureau of Standards and throughout the Department. The Secretary and I 
both appreciate what a priceless resource the Nation has in this staff, We are challenged to 
experiment, to innovate, and, if necessary, to create new institutions within the Department to 
expand our technological horizons. We want to find out how these people can better use technology in 
its proper role to accomplish the mission of the Department. Its proper role, in our view, is to serve 
people through meeting the needs of business, industry, the environmental community, and other 
nations through the free enterprise system. I am convinced that the information analysis centers share 
with us part of the responsibility of serving people everywhere— not just with transfer of information, 
for many others, such as NTIS, do that. You have the major responsibility for assuring the quality of 
information and its interpretation for appropriate use. We’d like to tap into your network, into your 
plans, yes, even into your dreams and aspirations. For we too, have a national stake in better 
technical information. 
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THE ROLE OF SECONDARY SERVICES AND INFORMATION ANALYSIS CENTERS 



Dr. Russell J. Rowlett, Jr., Editor , 

Chemical Abstracts Service 

I have completely re-oriented my thoughts for this presentation in light of the opening day's 
discussion, because I feel, as a representative of a diseipline-oriented secondary service, that 1 must 
say some things that need saying even though they are things which perhaps you will not wish to 
hear. 



Let me begin by looking at the six different components of the information world as 1 see 
them. First, of course, is the information creator. This is the author or group who starts the whole 
publication process by preparing the research or technological report or patent. Second is the 
primary record , the publicly available document produced from the original report by the publisher. 
Third are document archives , the storage of the original report by a library which makes it available 
to those who seek it. Fourth are data archives, as in an Information Analysis Center, which most of 
you here represent. Fifth is the secondary record , produced by Chemical Abstracts St vice and other 
of the so-called “abstracting and indexing” services. Finally, at the end of the pipeline we have the 
information consumer , the user of the information as processed by the other five information 
components. 

There is the capability for logical, progressive build-up of an information framework among 
these six components with a minimum of overlap. Traditionally, however, they have all operated 
separately and independently. Under the old framework, this was possible because the human 
intellect — largely the consumer’s intellect — bridged the inconsistencies and inaecu: icies, of which 
there are many. For example, as was mentioned here yesterday, Chemical Abstracts , Engineering 
Indt x, and BIOSIS found in die very first step of an overlap study that there was difficulty in 
Identifying the “document packages,” that is, the journals which the three cover. This was so because 
the bibliographic citations were inconsistent. The three have never cooperated to arrive at 
standardized document identification, because in the past, the users of the three services could 
intellectually bridge the differences. The library community, even more so than the secondary 
services, continues to rely upon the human intellect to bridge inconsistencies and inaccuracies. 

But the growing use of machine handling plus today’s economic pressures demand that all of 
the components of the information world work together for a standardized and consistently identified 
data package. In principle, there exists in the primary literature a single coverage policy — - one 
paper by one author or group of authors is published only one time. A similar policy needs to be 
promoted and established in secondary abstracting services. We are working toward this end. The 
scientific community can no longer afford to have several secondary services analyzing, abstracting, 
and indexing the same document. And, in my opinion, information analysis centers should be 
working toward the same elimination of duplicate intellectual effort. We need a standardized 
identifier for the bibliographic citation and the bibliographic package, and we need a routine 
procedure for the user to obtain this package from his local library. 

The components of the information world must cooperate, not only for elimination of the 
duplicate intellectual analyses which go on at the primary and secondary publishing stages, and, in 
my opinion, at the information analysis center; but also, for elimination of the duplicate input 
keyboarding of identical data. The American Chemical* Society is working toward these ends. When 



we have the primary publications available in machine language, the secondary services will use 
directly the titles, abstracts, citations, references, and eventually even chemical structures. 



Let me emphasize that before Chemical Abstracts is willing to eliminate any of its coverage 
of chemistry in the overlap areas of biology, physics, medicine, etc., we want to be certain we have 
built bridges into our indexes which will allow the user to go directly from the discipline-oriented 
secondary service of chemistry to the discipline-oriented secondary service of biology, physics, or 
engineering. If we do not use the same index terms, then cross-references should guide the user 
without any doubt. Only when this is accomplished will we be ready for what has been called 
“mutually exclusive coverage,” 



Andy Aines asked here yesterday morning what the lAC’s and the secondary services could 
do to help each other. In my opinion, the I AC’s should start with the abstracts and index entries 
available from the discipline-oriented secondary services and build upon them. They should not set 
up duplicate abstracting and indexing services. 

Let’s look very quickly at the nature of the secondary information services, First, they are 
document-accessing services. They are not data - accessing services. The secondary services provide 
access to the primary documents, the primary literature. But the abstract is not a surrogate. It has 
never been and is not today the purpose of the secondary services to replace the primary literature. 
Their document-accessing function might be compared to the enrichment of an ore. The secondary 
service selects the ore, refines it, and has it ready to process, but does not actually extract the pure 
metal. Completion of the process requires separation of specifically needed data from closely 
associated material. In my opinion, this is the objective of Information Analysis Centers, 

The secondary service focuses on new information, on facts not fancy. This new information 
is not evaluated in the secondary service, and the user, knowing that the accepted values reported by 
the secondary services are not always authentic values, must make selections based on his own 
individual needs and experience. It is the place of the lAC’s to determine which accepted valuer, are 
authentic and which are pertinent to the particular task of a specialized user. 



1 AC’s can, in addition, provide a level of oata identification which is not possible in a general 
discipline-oriented secondary service index. Some time ago a CAS survey of the types of data 
reported in chemistry and chemical engineering showed that chemists and engineers are capable of 
measuring almost 1100 different chemical properties, uses, applications, activities, etc. Yet scientists 
continue to demand a guarantee that, every time a particular thermodynamic property related to their 
individual interests is recorded in a primary paper, CAS carry an index entry for that property. Such 
a guarantee is absolutely impossible! But, it is possible for secondary services to do a better job of 
indicating the properties, and the combinations of uses and applications that are recorded in original 
papers. We need your help in this area. We are going to conduct an experiment soon in which we will 
code a number of selected abstracts according to the kinds of properties, uses, and applications that 
are measured. We cannot code for all 1100 properties but we can code for a couple of dozen. We 
need suggestin' on which properties and activities will be of most relevance. In this way the 
secondary services will also aid the IAC’s in their analytical task, for, 1 repeat, it is my opinion that 
their task begins with the abstracts and the index entries which are available from the 
discipline-oriented secondary services. 
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Let me now turn to subject coverage. Here continuity has been considered most essential. We 
arc concerned about continuity of subject coverage across a given field of chemistry, chemical 
engineering, biology, physics, etc,, but we also are concerned about continuity throughout a period of 
time. Science is a living subject, and the body of information grows with the advance of research 
understanding. Years ago when spectroscopy was new, its practi oners convinced Dr. E. J, Crane 
that he should put an entry into the Chemical Abstracts indexes every time someone reported spectra 
measurements. Today these spectra entries total 30 columns in a six-month volume index. 
Spectroscopy is no longer an unusual interest, but these entries have been continued to maintain 



The definition of that which has wide utility within the entire scientific community, but which 
is not so specific as to be useless to a large percentage of those who subscribe to a discipline-oriented 
service, is a significant problem. Too much detail wastes the money of the subscriber to the general 
secondary service, and yet this is the detail which is needed to operate an information analysis center. 
Reaching a happy medium should be our goal. Recognizing that we cannot do your whole job, you 
should be able to start from what we have done and build from there. At the same time, those who 
utilize a secondary service in a special framework should not have to redo the work of the service. 

Another coverage problem 1 lutes to negative data. Here again we need the guidance of those 
who are using secondary publications. Acceptance of negative data in a secondary service is often 
based on arbitrary standards. Do you look for an actual quantitative measurement? What do you 
index in a paper which reports only plus and minus for the activity of a chemical? We need to know 
what level of negative data the user actually requires. But communicating with subscribers and users 
is a difficult job for a secondary service. Subscription lists consist of purchasing agents, librarians, 
secretaries, etc., rather than people who actually use the service. We need communication from users 
and from lAC’s. 

Suggestions are neede''* not only on subject content but also on timeliness. An 1AC should 
have the abstracts and index entries within a time frame that fits its needs. Briefly let me describe just 
what CAS is doing to nprove timeliness. We now receive almost all of the basic journals of 
chemistry and chemical engineering by airmail in page-proof form. For the Journal of Organic 
Chemistry, an ACS periodical, we are actually using manuscript prior to primary publication. 
According to the agreements which have been negotiated with the West German Chemical Society 
and the United Kingdom Consortium, the ACS will eventually also be provided with their primary 
documents in manuscript form after they have been accepted for publication. These international 
input centers are performing volume, in-depth indexing simultaneously with abstracting. We are thus 
eliminating intellectual duplication and are handling a given document in one professional operation. 

In the period of over two years that CAS has abstracted and indexed from the original 
manuscripts of the Journal of Organic Chemistry, many significant errors made by the authors in 
chemical structures, molecular formulas, etc., have been detected. We find such errors by input of the 
structures to the Registry System. The corrections are returned to the primary journal office in time 
to be incorporated into the original published record. Thus are the records of the primary journals 
corrected. There are also errors in the secondary service records. At Chemical Abstracts we feel some 
subscribers just love to look for errors, and we don’t like to disappoint them. Today we must handle 
on the average 1400 abstracts every day. This means prepare, process, edit, and index 1400 abstracts 
plus about 15,000 index entries, of all types, each and every day. Even our staff sometimes does not 
realize that we probably process daily more characters than most metropolitan newspaper? 



continuity. 




Secondary services are subject-oriented in such a way as to have predictable access routes. 
Certainly you demand this for the kind of use you make of secondary index services. A printed 
service must depend on a hierarchy, an order within the index, because it is searched by human 
intellect. It is an organization of access keys. This is particularly relevant to the CAS handling of 
chemical substances ir which an already rigid control of vocabulary has been reinforced in recent 
years by the development of the CAS Chemical Registry System supported by the National Science 
Foundation. Today this Registry System enables staff to retrieve 65 to 70 percent of all the CA index 
names and molecular formulas, edited and verified by the computer without professional 
intervention. This makes possible a significant dollar savings. The names are input just as they 
appear in primary journals, and the complex chemical nomenclature is retrieved. 



It is fortunate that chemistry has a unifying factor the molecular structure, which is a 
two-dimensional structure with a third dimension that can be interpreted. Thus, it has been possible 
to develop an information system based on this mathematically interpretable structure. Other 
scientific disciplines are not as fortunate and have not been able to develop systems around such a 
central structure. Physicists use Chemical Abstracts because they desire to search by solid-state 
compounds, thin films, etc. Actual surveys show that more non-chemists than chemists use Chemical 
Abstracts , 



But the secondary services cannot provide all of the data values which you need for complete 
analysis. We can only lead you to the sources where you can find the data values. We can give you 
some but not all of the indicated data. 

The complete identification of a given concept often results only from a combination of facts. 
Such facts find their full expression along different axes which are broad in form in several scientific 
disciplines, It is irrational to expect all of these scientific disciplines to put down separately the same 
concept. Yet there must be a compatibility between services. This is the point we are urging on our 
colleagues in tue other secondary services: that we build index bridges and clearly indicate them, 

Since the needs of information analysis centers appear to include data from several secondary 
services, the proper index bridging is of great importance to an IAC With these guideposts, you will 
be able to go from one secondary service index to another. CAS works by collective index periods, 
and for the ninth collective period which begins in 1972, we are studying ways to build such bridges 
between the CA indexes, MEDLARS, AIP Classification Scheme, BIOSIS Basic Index, Nuclear 
Science Abstracts, etc. We are not going to be able to accomplish everything at once, but building the 
bridges will be a beginning. 

In conclusion, let’s look at the needs of an Information Analysis Center vs. those of the 
informadon consumer. In my opinion, a specialized information center is always interested in 
exhaustive records of a given type of data. The information consumer on the other hand is often 
interested in only an accepted value. There are two general kinds of users for Chemical Abstracts* 
One is the man who is looking for an exhaustive record. You see him in the library running his finger 
down the entire page. No matter where you put the entry in the index he will find it, even if he has to 
look cover to cover. He is making a patent infringement or domination search, or he wants to find 
every fact in a new research area. The other man has a pot foiling in his laboratory. He wants one 
value, a melting point, a frequency line of a spectra, etc. He tears into the library, grabs an index and 
finds one reference. If the data agree wi what he’s got, fine. If he can’t find anything, equally fine; 
he’s found something new in the lab. There are differences in, the ways of serving these searchers. In 





an information center, you want an exhaustive record, but a secondary service must also satisfy the 
user who needs only the accepted value. 

How can a secondary service help to point out to information analysis centers where specific 
types of data are to be found? I mentioned our experiment in coding. I hope that will be helpful. We 
need additional experiments. Perhaps you have ideas. A lother thought, can a secondary service such 
as CAS indicate to information consumers the particular subject fields covered by an information 
analysis center? We have a CAS Source index. It was formerly the List of Periodicals, It includes the 
library holdings of almost 400 libraries all over the world; almost 30,000 journals are included, Is 
there some way within the confines of this Source Index that we can indicate where a user should go 
for the type of study that an information analysis center can do on a given subject area? I think u is 
possible. We would like your comments. 

I have tried very rapidly to review the components of the information world, to indicate some 
of the problems and some of the interfaces between secondary services and information analysis 
centers. I realize I have only scratched the surface. 1 look forward to your questions. 




*THE USE OF ABSTRACTING AMD INDEXING SERVICES AT THE ER11C 
CLEARINGHOUSE ON LIBRARY AND INFORMATION SCIENCES 
(ERIC/CLIS): A CASE STUDY” 



by 

J. I. Smith 
Associate Director 

ERIC Clearinghouse on Library and Information Sciences 



INTRODUCTION 



The role of the ERIC Clearinghouse on Library and Information Sciences (ERIC/CLIS), 
operated by the American Society for information Science for the U, 8* Office of Education, centers 
about three major functional elements: 

(1) A Clearinghouse center which acts as a catalyst, focal point and agent for the 
acquisition, document processing (cataloging, abstracting, and indexing), announce- 
ment, and dissemination of fugitive reports and journal literature (in effect, a type of 
secondary service), 

(2) An information service center which handles an increasing number of inquiries, 
and serves as a referral, or switching center, to existing sources of information in the 
library and information sciences, and 

(3) An information analysis center which identifies burning issues of current need 
within our scope of coverage, and responds to these by the synthesis and analysis of 
information from the past and current literature. 

Although my main discussion will focus on the information analysis activities of ERIG/CLIS, 
which gives us a reason for either using or not using abstracting and indexing services, I would first 
like to briefly describe the ERIC system so that you can fully appreciate our rationale and methods of 
operetion, 

ERIC (the Educational Resources Information Center) is a nationwide system established to 
serve the field of education through the dissemination of information on educational resources and 
research materials. 

The total system functions on both a decentralized and centralized basis, and consists of the 
following components: 

(1) The management group within the U. S. Office of Education, called ERIC Central; 

(2) A network of clearinghouses, each with its own subject area of responsibility 
(ERIC/CLIS is one of these Clearinghouses; its subject area being library and information science); 

(3) A central document processing and reference facility, currently operated by Leasco; 

(4) A central source for obtaining copies of documents in microfiche and hard copy. This 

service, called the ERIC Document Reproduction Service (EDRS), is currently operated by Leasco 
Information Products Co.; and t 

(5) A centralized contractual endeavor to produce an index of the journal literature in 
education, currently operated by CCM Information Corp, 
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CLEA RING HOUSE ACT! VITIES 
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Each of the Clearinghouses in the network performs similar functions within its particular 
subject area. These functions include: 

(1) Identification and acquisition of the so-called “fugitive” reports, papers, speeches, etc,, 
which are not published through commercial channels, and the identification of core journal articles, 
within its subject field; 

(2) Evaluation of the documents received; 

(3) italoging; 

(4) Abstracting and indexing; 

(5) Forwarding the document resume forms (which contain the cataloging information, 
indexing terms and abstract) to the central document processing facility, along with a copy of the 
document itself; and 

(6) Sending journal article resumes to the contractor for production of the journal index. 

The Clearinghouses retain hard copies of each document received for their own library 
collection, and also maintain a complete file of microfiche of all the documents which have processed 
by the central document reproduction service. These activities provide each Clearinghouse with a 
fairly extensive bibliographical base of fugitive document literature. 



INFORM A TION SER VICES 

The document resume forms which are forwarded to the central processing facility by the 
Clearinghouses, are put into machine-readable form by the facility and the resultant tape is used to 
produce a monthly abstract publication called Research in Education ( RIE), The journal resumes are 
also put into machine-readable form, for production of the monthly journal index called Current 
Index to Journals in Education (CUE). Each Clearinghouse automatically receives copies of each of 
the two monthly publications for ready reference use. The input tapes for both of these publications, 
which are updated on a quarterly basis, provide machine-searching and retrieval capabilities. 

Building a data base of the document and journal literature in the field of education is a part 
of the mission of the ERIC system. This data base, along with the broaa announcement and document 
availability service proviaeo Dy the system, makes ERIC a complete information system, unique in 
the field of education. 

I mentioned earlier that ERIC was both a decentralized and centralized system. The 
decentralized portion of ERIC consists of the Clearinghouses which work independently with their 
respective user communities in the way that best meets the needs of those communities, and, at the 
same time, conform to rules and guidelines established by the Office of Education for processing 
documents to meet system standardization requirements. The Clearinghouses then feed ihe processed 
information into the central document and journal article processing contractors for the production 
of the magnetic tapes, from which the two monthly announcement publications mentioned earlier 
are produced. The tapes themselves are made available for machine searching by system users and 
may be purchased from the central processing facility. 




ERIC/C LIS, along with the other Clearinghouses, has a limited capability for providing 
direct service to users. Financial constraints prevent us from providing great numbers of 
bibliographies, or separate listings of the documents we process. To most effectively reach the 
members of our user community under these restrictions, our announcements are limited to those 
appearing in our newsletters, which contain brief descriptions of the documents which have been 
processed in the Clearinghouse. Also in order to provide broader dissemination of information to 
audiences with specialized interests who may not have access to the ERIC publications, ERIC/CLIS 
sends its document resumes to twelve library and information science journals. Each editor of those 
journals selects and publishes those abstracts which are relevant to the readership of that particular 
journal. Last year, ERIC/CL IS reached nearly 60,000 people on a continuous basis through this 
announcement mechanism, thus providing them with information in their specific subject areas of 
interest. The point, is that these information service activities are extremely important as a means of 
keeping ERIC/CLIS constantly before the eyes of the library information science community, and we 
accomplish this, under our financial limitations, by “piggy-backing’ 5 other existing services in our 
field. As a matter of fact, our input into the ERIC system is used by abstracting and indexing services 
as part of their coverage. 



INFORM A TION A NA L YSIS A Cl I V1TIES 



In my opinion, the main, and most exciting, aspect of a Clearinghouse is that each of us also 
serves as an information analysis center for respective user communities. We have constant contact 
with our users through our acquisitions program and our information services activities, our staff is 
currently up-to-date on all developments through the document processing of the fugitive and journal 
literature, and we have a data base consisting of input from the 20 different centers, which means 20 
different subject areas within the field of education. Thus, we are very much aware of what is needed 
in our fields by way of bibliographies, state-ofdhe-art reports, literature reviews, short papers, etc. 
We do not get into data compilations or quantitative evaluations, as many information analysis 
centers do. Instead, the information analysis publications of ERIC/CLIS are aimed at providing 
information in direct response to the needs of managers, practitioners, research workers and users of 
libraries and information centers, by synthesizing and evaluating existing knowledge in response to 
those needs. 

These special publications are produced by commissioning an expert in the field to write the 
report, paying him an honorarium, supporting him with bibliographies and a machine search of all 
relevant data in the ERIC system in his subject area, obtaining copies of papers, and providing 
minimal funds for typing the manuscript and for local expenses. In other words, our basic 
involvement entails the provision of bibliographic support to the authors. Most of these authors are 
not part of the ERIC/CLIS staff. The reason for this is simply one of economies. Our operation is 
just too small to allow a staff member to take time from his (or her for the sake of women’s rights) 
other duties to write such papers, whereas the staff, working as a whole, can provide much more 
effective support to outside authors who are knowledgeable in a particular topic. 

The overall scheduling for our information analysis products is generally as follows: 

(1) Collect subject areas of interest and concern which have been expressed in 
letters, personal communications, and the literature; 

(2) Evaluate these subject areas to establish priority needs through consultation with 
users and advisory boards; 
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(3) Select the target areas to be considered and identify potential authors who have 
capabilities in these areas; 

(4) Communicate with potential authors and establish agreements for honoraria, 
bibliographic support and date of completion, as within six months of agreement date; 

(5) Send letter of confirmation; 

(6) Send author's guide; 

(7) Initiate machine search of the ERIC data base, and send copy of search results 

to author; , 

(8) Send updates of that search on a monthly basis to the author; 

(9) Receive rough draft of report and submit to other authorities in the field for 
review and appraisal; 

(10) Send comments from reviewers to authors, and begin editing the report (the 
editing is done by an ERIC/CLIS staff member); 

(11) Prepare the paper for final typesetting and printing, and 

(12) Distribute copies of the report. 

As you can see, bibliographic support keeps everybody quite busy and when you have 
approximately 50 different information analysis projects in motion at the same time, you can well 
imagine the amount of work required to support these projects. We also provide bibliographic 
support to the authors of the Annual Review of Information Science and Technology, as well as 
review publications in the library and information science field. 

THE USE OF ABSTRACTING AND INDEXING SERVICES 

Although the fugitive documents contained in the ERIC data base are invaluable as a source 
of information, we find that they are by no means the complete answer in providing authors with the 
broad range of literature needed to write a good review, or compile an extensive bibliography. 
Therefore, we find it necessary to go to other sources in our field which cover the journal literature 
extensively. Although we routinely process approximately 20 core journals in the field, there are 
about 500+ journals in the library and information science fields, which means that we do not 
approach complete or exhaustive journal coverage, by any means. 

In addition to the data base we have created for the ERIC system there are about four main 
abstracting and indexing services in the library and information science fields. 

The problems in using these secondary services are as follows; 

(1 ) The terminology and classification schemes of each service differs substantially; 

(2) The material contained within the publications of these secondary services, for 
the most part, is quite old; 

(3) There is a significant overlap in the journals covered by these secondary 
services, and 

(4) The indexes have not been computerized and, only manual searching can be 
done, which is, of course, time consuming. 

The apparent disadvantage of using secondary services as part of our bibliographic support 
is, however, ironically enough, a blessing in disguise because the field of library and information 
science is indeed being thoroughly covered by the abstracting and indexing services in the field. For 
one, this means that we can concentrate to a larger extent on the fugitive literature, for which we 



alone provide service of this kind in the field. It also means that our authors have access to these 
secondary tools, which is why we commission only those people who are active and knowledgeable in 
the field, since we assume that these authors are not only aware of these secondary services but are, in 
fact, using them. 



The main point is that we have, first of all, identified the secondary services within our field, 
evaluated each of them for possible application to our own bibliographic support activities, have used 
them on a very selective basis, and are making efforts to make changes. We also scan approximately 
20 primary publications, especially those which have reviews in them which may be of interest to our 
authors, but as yet we have not incorporated these efforts into any kind of organized bibliographic 
activity. One of our special projects is to develop a complete alphabetized listing of all terms in our 
field with a reference as to how that particular term is used by a particular secondary service. This, 
we hope, will be of benefit to us when we do retrospective searching outside of our own data base. 
Another special project we have initiated is that of incorporating all of the references cited in the 
fugitive documents and compiling a type of citation index to these references. We hope to put these 
references into machine-readable form and combine them with the data base we already have. In 
effect we are building our own data base, rather than using abstracting and indexing services except 
on a manual or scanning basis. 



THE ASSOC! A T!ON OF SCIENTIFIC INFORM A TION DISSEMINA T/ON CENTERS 



For those of you who are interested in using abstracting and indexing services, I suggest that 
you make contact with the Association of Scientific Information Dissemination Centers (ASIDIC). 
The purposes of this Association are: 

(1) To promote the applied technology of information storage and retrieval, as related to 

large data bases containing bibliographic, textual, and fact information; 

(2) To share experiences in information handling through meetings, seminars and work- 
shops; 

(3) To recommend standards for data elements, formats and codes; and 

(4) To promote research and development to provide a more efficient use of existing and 

varied data bases. 

Membership in this group is held by organization, not by individuals, and information centers 
which meet the following criteria are eligible for full membership: 

(a) Center operations are computer-based; 

(b) Data bases from two or more suppliers are processed; and 

(c) A minimum of 100 user -interest profiles are processed on a continuing basis. 

There is also an associate member status available for suppliers of machine-readable data bases, and 
for other organizations which have an interest in the affairs of the Association, but do not meet the 
criteria for full membership. 

The members of ASIDIC essentially reprocess tapes procured from organizations such as 
Chemical Abstracts Service, Engineering Index, the Institute for Scientific Information, Biological 
Abstracts, and others for their individual information purposes. These member centers have 
developed program packages that are capable of searching multiple data bases. A few of the centers 
which belong to ASIDIC are: the University of Georgia; the Illinois Institute of Technology Research 
Institute, Chicago; the University of Iowa; the University of Nottingham, England; the National 
Science Library in Ottawa; and the University of Pittsburgh. 



Any of you who are interested in providing searching services for your centers from some of 
the major tape services available might want to contact AS1DIC for further information. The 
President is Dr. James L, Carmen of the University of Georgia, and the Secretary, who sends out the 
newsletters and other information, is Miss Diana Follmer, 3M Center, St* Paul, Minnesota 55101. 
You may want to contact her to be put on the mailing list. 




USES OF ABSTRACTING AND INDEXING SERVICES IN lACs 

Robert E. Snider! Director 
Air Force Machinability Data Center 

A case study of the use of abstracting and indexing services by the Air Force Machinability 
Data Center (AFMDC) disclosed very limited utilization of these services, \ will explain why we at 
AFMDC have not been able to justify a more extensive use of them. 

Firsts however, I will describe the scope of operations of our Center for the purpose of show- 
ing that AFMDC is somewhat unique to the centers that deal with chemistry, metallurgy or elec- 
tronics. 



The Air Force Machinability Data Center is located in Cincinnati and is operated by Metcut 
Research Associates Inc. under a contract with the Air Force Materials Laboratory, At AFMDC 
we collect, evaluate, store and disseminate material removal information including specific and de- 
tailed machining data for the benefit of industry and government, A strong emphasis is given to 
engineering evaluation for the purpose of optimizing data being disseminated. 

Data are being processed for ail types of materials and for all kinds of material removal 
operations such as turning, milling, drilling, tapping, grinding, electrical discharge machining, 
electrochemical machining, etc. 

AFMDC is using a computerized system for storage and retrieval of some 26,000 coded 
documents related to the material removal processes. 



As I stated earlier we have not been able to make extensive use of abstracting services. One 
of the primary reasons being that there seems to exist a language barrier between abstractors and the 
terminology used within the material removal industry, 

Charles T Meadow in his book entitled “The Analysis of Information Systems’^ 1 ) said: 

“Almost all index languages in use are to some extent artificial. A natural language, 
although hard to define, is easy to illustrate. English, French, and German are natural 
languages. They are the languages that people naturally speak. Index languages are 
invented, not for general communication, but for a very special form of communica- 
tion— that of enabling indexers and library searchers to communicate with each other 
and, in a sense, with the documents of the library. The particular role that the 
language is to play will vary with the library, the collection, and the users. Selection 
or design of an index language is probably the single most difficult step in designing 
an information retrieval system; in our opinion, the biggest single reason for this is 
our general inability to predict the performance of human beings when faced with a 
communication system different from that with which they have become familiar. Our 
approach here is to present some basic principles for the design and use of these 
languages, leaving it to the designer of an individual system to apply them to each 
local condition.” 

1 Meadow, Charles T., The Analysis of Information Systems , John Wiley & Sons, Inc. Nesv York, 1967 



In past experience in trying to establish an AFMDC Interest Profile with some abstracting 
services, we have encountered somewhat the same experience that an aerospace engineer found when 
he approached his computer supervisor and said that he would like to find some reports on research 
conducted on bearing materials applicable to high flying aircraft. Two uniterms were selected for 
the computer search “bearing” and “high altitude,” Only one document was cited and upon recovery 
of this document it was found to have the title “Child Bearing in the Himalaya Mountains.” 

In the manufacturing community much of the terminology, although natural to English 
language, have different meanings and thus cause monitors of AFMDC’s interest profile and ab- 
stractors to cite many documents that are not relevant to our needs. For example, simple words 
such as turning, milling and tapping can relate to various fields. The word milling in one report may 
be describing milling as performed in the ore industry; in another it may be talking about the basic 
powder metallurgy industry. In the material removal field, milling is a term used for cutting material 
on a milling machine by cKp making and it can also be used in a nonconventiona! machining opera- 
tion of material removal by chemical attack called “Chemical Milling,” 

Thus at AFMDC, we have established our own abstract review using personnel familiar with 
the manufacturing industry and trained in the knack of recognizing documents relevant to our needs. 

At the present time, we are searching the following abstracts for acquisition and in so doing 
have developed a knowledge of where within many of these that our field of interest normally ap- 
pears. Certain areas of these abstracts have proven to be the most productive to AFMDC: 

Metal Abstracts (ASM) 

International Aerospace Abstracts (IAA) 

United States Government Research & Development Reports (USGRDR) (U.3. Dept, 
of Commerce) 

Scientific and Technical Aerospace Reports (STAR) 

Aerospace Research Applications Center (ARAC) 

Current Awareness Programs from DCIC, etc. 

NASA Technical Briefs, etc. 

Materials Information Bulletin - AFML (Contracts) 

Some periodicals have abstracts section, especially foreign 

Publication listings in society journals 

I would like, in closing, to say that we at AF'MDC certainly appreciate the valuable time 
saving contribution abstracting services are providing the information community involved with the 
majority of fields of science. However, AFMDC could utilize them to a fuller extent if both abstrac- 
tors and interest profile monitors in some abstracting services were more oriented to the material 
removal industry. 
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A PROFILE OF SCIENTIFIC TECHNICAL TAPE 
INFORMATION SERVICES 



John M. Gehl and Vladimir Slamecka 
School of Information and Computer Science 
Georgia Institute of Technology 
Atlanta, Ga, 30332 

Ini rod net ion 



We wish to trace the outline of some of the main features of scientific-technical tape services 
which have been developed during recent years. In preparing this profile, we have drawn almost 
exclusively on Kenneth D, Carroll’s Survey of Scientific- Technical Tape Services published by the 
American Institute of Physics in September 1970. Although not quite up-to-date, this survey suffices 
our purpose of exhibiting commonalities and variations of characteristics of these services. 

To begin, we quote briefly from the motivation section of the Carroll report, which puts the 
results of the AIP survey in perspective: 



“During the past few years there have been an increasing number of tape services 
entering the information resources market. Each of these services makes available to 
a library or information center, on a continuing basis, computer-readable data which 
can be utilized in as many diverse services as the center’s programs and clientele 
require. As these services increased, it was sometimes a problem for libraries and 
information centers to keep up with all the various data bases available, the subject 
areas covered by the tapes, whether the organization offering the tape performed 
in-house services upon request, or if software was available to the subscriber. In the 
preliminary survey reported here, we have tried to compile a directory of current tape 
services, listing for each service the general characteristics of the data base. ’’(Page 2) 

In recognition of problems such as those mentioned, the Carroll study solicited information 
from representatives of all known commercially available tape services (including two services 
offered by Federal agencies: ERICTAPES, from the Educational Resources Information Center; and 
U. S. Government R&D Reports, from the National Technical Information Service). Information 
obtained from these inquiries Is shown in the report under the following categories: 



Name of Tape 
Source 

Contact- Representative for further information 



Characteristics of the data base (including: subject matter- 

types of source items input; methods of subject analysis or indexing; 
search able data elements; availability of abstracts; and time span 
available) 





Frequency of tape issue 



Average number of source items per tape 

Subscription cost or leasing details 

Software availability 

Type of in-house service offered 

Publications produced from base by originator 

Fifty-six services are listed in the Carroll report. (One of these ap 1 ears in name only - the 
IEEE Annual Index Tapes, a service which at the time was in the final stages of development.) 



The “Developers’* 

The principal sources for commercially available scientific-technical services are learned and * 
professional societies, publishing firms, and commercial organizations, The CCM Information 
Corporation offers six separate tape services, Chemical Abstracts Service offers seven, Derwent 
Publications, Ltd (England) also offers seven, and Predicasts, Inc. offers five. Three tape services are 
offered by organizations within the U. S. Government — the two already mentioned, and MARC, the 
machine-readable cataloging distribution service offered by the Library of Congress, Four of the 
services are offered by institutions located outside of the United States: in addition to Derwent 
Publications, already mentioned, they are the Institute of Electrical Engineers (England), Shirley 
Institute (England), and the International Atomic Energy Agency (Austria). Finally, we might note 
that one service is directly affiliated with a university: that service is Petroleum Abstracts, produced 
by the Information Services Department of the University of Tulsa, Tulsa, Oklahoma. 



Virtually all of the organizations studied use their data bases to produce one or more 
publications; these are either bibliographies, indexes, abstracts, thesauri, keyword supplements, 
patent review books, data books, or similar products under different names. For example, the 
National Information System for Physics and Astronomy, which offers a tape service called SPIN 
(Searchable Physics Information Notices), publishes, among other products, the current awareness 
journal Current Physics Titles > 



Subject Coverage 

The subjects covered by tape services span almost the entire range of scientific knowledge, 
though coverage is not equally balanced from subject to subject. Chemistry and chemical engineering, 
for example, are specifically covered by eleven different tape services; of these, one focuses on 
marketing information, another on patent information, and two others on those portions of chemistry 
and chemical engineering which are directly pertinent to the petrochemical and petroleum refining 
industry. 

Nor is petroleum the only industry which receives explicit attention. Another example we 
could cite is the pulp and paper industry, which is the subject of three tape services, all of which are 
produced by the Institute of Paper Chemistry, These services comprise the tape equivalents of the 
Abstract Bulletin , of the Author and Patent Indexes for that publication, and of that publication's 



Keyword Supplement , Yet another example of an industry=oriented tape service is that offered to the 
textile industry by the Specialized Information Service Data Base, produced by Shirley Institute. 

Two of the services now available are concerned with polymers, plastics and macromolecules; 
one simply with plastics; and one with plastics and eiectrical/electronics engineering. 

One service covers diodes, transistors, microwave tubes and integrated circuits; one covers 
physics, electrical/electronics engineering, computers and control engineering; and one covers 
eiectrical/electronics engineering, computer science, and applied physics. The subject matter of 
another is physics and astronomy. 

Three services are concerned with the mathematical sciences; five with metallurgy, farming, 
agriculture, or the earth sciences; six with biochemistry, virology, or the life sciences. No less than 
seven are focused on statistical or financial information. 

Finally, an additional seven tape services provide broad or interdisciplinary coverage, These 
include: the CCM Corporation's Current Index to Conference Papers in Engineering; COMPEN- 
DIX, a service of Engineering Index, Inc,, which covers all fields of engineering and certain fields of 
applied science and management; PANDEX, which provides broad coverage of scientific, technical 
and medical journals; U, S. Government R&D Reports, a service whose coverage includes not only 
scientific and technical subjects but social sciences as well; ERICTAPES, which are concerned with 
providing coverage of varied aspects of education; the Institute for Scientific Information's Combined 
Source and Citation Data Tape, which offers broad interdisciplinary coverage of journal literature, 
including the primary journals of basic and applied science, engineering and technology, medicine, 
psychology and psychiatry, and the behavioral sciences; and that same Institute's Source Data Tape, 
which provides similar coverage. 

One last service which we will single out for special attention is the INIS Output Tape, which 
is produced by the International Atomic Energy Agency and which covers nuclear science and 
technology. 

Volume of Data and Periodicity 

At this point it may be appropriate to give some idea of the volume of information provided 
by these services. The rather crude measuring unit for this purpose will be the number of items cited 
per tape. Of the total number of services for which information on this question is available, 
approximately one half cite more than 5,000 source items on each tape. Two of these in fact cite 
more than 20,000 such items. One is ICRS, the Index Chemicus Registry System tape, which cites 
4,000 abstracts and 17,000 Wiswesser Line Notations on each monthly tape, for a total of 21,000 
items; the other is Predicasts Corporation’s F&S Index of Corporations and Industries, which include 
approximately 25,000 source citations on each of its quarterly tapes. 

Considering the wide variety of topics covered by these tape services, it is not surprising to 
find that the number of tapes issued each year is quite different from service to service. Virtually 
every conceivable time interval is represented - weekly issues, three issues a month, biweekly issues, 
semimonthly, monthly, eleven issues a year, quarterly, every four months, semiannually, and 
annually. Approximately 75% of all of the services issue tapes either on a monthly or an even more 
frequent basis. Of all the services, only one offers services on the basis of the frequency of issue 




requested by the particular subscriber; that service is the textile information service of the Shirley 
Institute in England, 

Combining now the information available on both the average number of source items cited 
on a tape, and the frequency of tape issue, we may conclude that, for 45 services for which sufficient 
dat^ was available on this question, almost half of those services cite more than 25,000 source items 
annually. Of this group, seven cite more than 200,000 items annually, and of those seven, there are 
two which cite more than 300,000, 



Cost 

The cost for all this information is not always low. However, more than 75% of the services 
are offered at annual costs of $2,500 or less. Apparently the most expensive service is the Institute 
for Scientific Information's Combined Source and Citation Data Tape; the subscription cost for this 
service is $20,000 a year. On the other extreme is the service offered by the international Atomic 
Energy Agency, which is free to member states. The subscription cost of the Library of Congress 
MARC tapes, which provide, on a weekly basis, current English-language monograph cataloging 
data, is $800 a year. 

Three of the services covered by the survey base their subscription charges on the subscriber’s 
gross assets. Two of these are offered by the American Petroleum Institute, and the third by the 
University of Tulsa’s Petroleum Abstracts service. As an indication that some subscribers do indeed 
have greater assets than others, we may note that average yearly costs for a subscription to the 
Uni versity of Tulsa service range from $200 a year to one hundred times that: $20,000. 

Sources of Data 

We ought at this point to get back to the question of what the information purchased from 
these services is all about. We have already discussed the subjects covered; let us now turn to the 
question of the scope of these services. Doing so, we may note that - as we might have expected - a 
large portion of current data bases ire devoted to coverage of the journal literature. More than three 
out of four services are such that 50% or more of their data bases are devoted to journal coverage, 
and almost two out of three are such that journal coverage accounts for at least 80% ot their total 
data base volume. 

The number of journals covered by the services varies from precisely one - which is the case 
for the Mathematics Computation Magnetic Tape, which is produced by the American Mathematical 
Society and which is comprised entirely of the contents of the one journal Computational 
Mathematics - to the 4,500 journals which contribute in an average year to the input of BA Previews, 
the tape produced by the Bi ©Sciences Information Service of Biological Abstracts a More than three 
out of four of the services for which information on this question is available provide coverage of at 
least 500 journals, 

A quite large percentage of this journal coverage is accounted for by English-language 
literature. Only one of the services has characterized its data base as predominantly (i,e #5 more than 
50%) non-English. That service is the one offered by the American Geological Institute. Its subject 
is the earth sciences (including areal, economic, engineering, extraterrestrial and marine geology; 
geochemistry; geochronology; geohydrology; geomorphology; and so forth); and it reviews 1,600 
journals for input, only 40% of which are in the English language. 
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One data base characteristic which varies considerably from service to service is the 
percentage of journals which are entered in entirety, as contrasted with the percentage which are 
reviewed and entered into the data base only selectively. Approximately half of the services fall into 
the former category. Examples of services which enter journal literature into the data base on a 
selective basis are the Virology index; Search-Data, a service offered by Compendium Publishers to 
provide abstracts with citations of original sources of marketing information in chemical and allied 
fields; arid Expansion and Capacity Digest, one of the market-oriented tape services offered by 
Predicasts, Inc, Examples of data bases into which journals arc entered in their entirety are the 
SPIN tape; PANDEX; the Chemical Abstracts service's Basic Journal Abstracts; and the Institute for 
Scientific Information’s Combined Source and Citation Data Tape. 



Approximately one out of three of the scientific-technical services for which information is 
available indicated that at least some part of their data base is devoted to coverage of the report 
literature, but only three out of twenty devoted more than 10% of their data base to such coverage. 
The three services devoting the highest percentages of their data bases to coverage of Government- 
sponsored research described in the reports literature are: the International Atomic Energy Agency 
tape (25%); WORLDCASTS (25%); and, obviously, the U. S. Government R&D Reports tape 
(69%). 



Another source of input which deserves separate attention is the patent literature. Nine 
services devote at least 25% of their data base to such coverage; of those nine, there are four which 
are devoted exclusively to that purpose. 



Three of the scientific-technical tape services are devoted exclusively to the coverage of 
papers presented at conferences. The three services, all of which are offered by the CCM 
Corporation, are: the Current Index to Conference Papers in Chemistry; the Current Index to 
Conference Papers in Engineering; and the Current Index to Conference Papers in Life 'Sciences. 
Each of these indexes provides coverage of approximately 1,200 meetings in the field specified. The 
chemistry tape service reports on 7,500 individual papers, the engineering service on 32,000 papers, 
and the life sciences service on 15,000 papers. 



The data bases maintained by three others of the services are devoted to statistical or 
historical data. They are: D.A.T.A. Book Files, a service which is offered by a division of Computing 
and Software, Inc., and which is concerned exclusively with manufacturer-supplied information on 
diodes, transistors, microwave tubes, and integrated circuits; the Growth and Acquisition Guide tape, 
a service offered by Predicasts, Inc., to provide tabulated information on acquiring and acquired 
companies by four-digit line SIC, line of business, and latest annual sales given for both; and 
COMPUSTAT, a financiaLoriented service which provides information on public-held companies. 
COMPUSTAT is offered by Investors Management Science and is comprised of data taken from 
company annual reports, government reports, daily news services and company contacts. 



just as we have previously seen that commercially available tape services provide coverage of 
virtually every subject, so we now find that they monitor virtually all forms of printed information - 
books, journals, reports, monographs, theses, patents, newspapers and even private information 
sources. 
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Having now a general picture of the characteristics of the data bases themselves, we may want 
to review briefly some of the techniques used for controlling those data bases. How is all ot this 
information moved about? How is it indexed for effective storage in the data bases, and how is it 
searched for and retrieved? 

We come first to the question of subject analysis and indexing. Of the approximately 30 
services for which information on this question is available, roughly half assign an average of from 
five to ten indexing terms to each item in the data bases. However, several of the services use a far 
larger number of terms to describe an item. These services are: the American Petroleum Institute’s 
index to API Abstracts of Refining Literature, for which an average of 35 index terms or descriptors 
are assigned each entry; the IFI/PIenum Data Corporation’s Uniterm Index to U. S. Chemical and 
Chemically= Related Patents, for which the average number of index terms or descriptors assigned per 
item is also 35; Investors Management Sciences’ COMPUSTAT service, which on an average assigns 
approximately 60 index terms to each item; and SEARCH-DATA, of the Compendium Publishers 
international Corporation, The SEARCH -DATA service assigns on the average approximately 100 
index terms to each item included in the data base. 

The indexing terms and descriptors referred to in the above figures included both controlled 
descriptors and free-language terms. Of 36 services for which an answer to this question could be 
determined, nine (i.e,, one out of four) relied entirely on free-language indexing, whereas 27 (i.c., 
three out of four) specified the use of controlled descriptors. However, of those 27 services which 
used controlled descriptors, approximately 50% employed free-language indexing as well. 

Relatively few of the services used classification schemes. Those which were used include; 
UDC (used by the tape service of the American Geological Institute), the American Mathematical 
Society’s Subject Classification Scheme, the classification system designed for Mathematics of 
Computation - Midwest Research Institute , the National Agriculture Library Classification, the 
Subject Headings for Engineering system, the IEE/IEEE INSPEC system, the International Atomic 
Energy Agency’s INIS classification schedule, and the U. S. Patent Office’s classification codes. 

About 40% of the data bases contain abstracts or their equivalent. 

Techniques for searching the various data bases differ considerably trom one tape service to 
the next, Of the numerous services which cover authored material (journals, reports, etc.), all but one 
allow searching of the file for the name of the first author and (in most cases) all other co-authors as 
well. (The exception is CITE, a service of Engineering Index, Inc,, devoted to applications 
technology in plastics and electrical/electronics engineering; ClTE’s tape records include a 
“searchable” segment, composed entirely of index terms, and a nonsearchable “display” segment that 
identifies author, title, and citation.) 



In addition, some services allow a search based on the institute with which an author is 
affiliated, his location, the sponsor of his work, and/or the publisher of his book. Other services 
offering variations on “authorship” -search include those which allow searches based on corporate 
authors, editors, patent assignees, or manufacturer names. 

Besides author’s name, another searchable data element allowed by almost all systems is, of 
course, the title of the article, report, or other authored document. Bibliographic information also 
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offers a legitimate search device for most of the services considered, For journals, such information 
usually includes such searcnable items as journal title, CODEN, journal volume, issue, and page 
numbers; for other material contained in the data base, a search may be conducted on such items as: 
conference name; report number; patent number; specially assigned document accession number; and 
so forth, 

i 

j 

But beyond such standard items as author, title, and basic bibliographic information, mo;t 
services allow searching of the data base in various other ways; the extent of this variation can 
perhaps be indicated by a simple (but certainly not exhaustive) enumeration of some of the data 
elements upon which a search of the data base content may be conducted for one or more of the tape 
services. Such searchable data elements, then, include: descriptors (with or without links and roles); 
keyword phrases; words in a document's abstract; the language in which a document is written; 
primary and secondary subjects of a document; indexing terms and title enrichment terms; and 
classification codes. 



For an example of searchable data elements allowed by statistics-oriented tape services, one 
could cite W O RID OASTS, which allows searching on any of the following items: industry-product 
SIC; country; event; event code; year; earliest year first; quantities; smallest quantity in given year 
first; unit of measure by type of unit; source (publication); quote (the name of the person making the 
forecast). 

As a final example of a searchable data element, we might mention the provision in CCM 
Corporation’s Virology Index which allows for searching exclusively either for review articles or for 
articles of a non -review type. 
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Of course, to perform a search on any of the data elements specified for any of the tape 
services, a subscriber needs not only the suitable computer hardware, c but appropriate software as 
well. With reference to approximately one out of three presently available tape services, a tape 
subscriber will be required to develop his own software. In such a case, the subscribing institution 
will use the tapes strictly as additional input to its own system, and will use its own software and its 
own search strategy. 

However, in the remaining cases, the institution which produces the tape either already offers 
supporting software to its subscribers, or will develop whatever software is required for an interested 
customer. In addition, some tape services have indicated that, although they do not themselves offer 
supporting software, various suitable search programs are available elsewhere on the commercial 
market. 



Attempting to offer only the briefest profile of the characteristics of some of the programs 
available in the various packages, we will merely recite the principal features of just one important 
group of services (those provided by the CCM Corporation), and will supplement this recitation with 
an equally brief consideration of some of the typical ways in which other services deviate from that 
paradigm, 

CCM Corporation, then, provides COBOL programs both for print-out and for SDI; the 
appropriate computer configuration is an IBM 360 with disk operating system and a 32,000-word 



core memory. The system uses 7- or 9-track magnetic tape with 800 bits per inch. Records are 
written in either fixed-field or MARC II format, and coding is EBCDIC, BCDIC, or ASCII. 

The most obvious ways in which descriptions of the capabilities of other services differ from 
the CCM services pertain either to the programming languages used or to the basic choice of 
computer equipment. Other languages in which software is written for these tape services include 
assembly language, autocoder, FORTRAN, and FORTRAN with ALC. 

Other hardware configurations are found in systems which require more core memory (e.g., 
the Uniterm Index to U. S, Chemical and Chemically Related Patents, which uses a 256K memory), 
or different operating systems (although some services offer various operating system alternatives, 
such as the Index Chemicus Registry System, which comes in DQS/TOS and OS versions). Only two 
of the tape sources use equipment other than IBM’s; they are Predieasts, Inc., which uses a UN I VAC 
1108, and Investors Management Sciences, whose information retrieval program will run on a 
Control Data Corporation 3300 computer (as well as on an IBM 360). 

Other Services and Charges 

However, not all of the benefits which a subscriber may obtain from the tape services are 
dependent upon his own search capabilities and his own computer equipment. To the contrary, a 
number of the organizations producing scientific-technical tapes now offer, or are planning to offer, 
in-house search services. 



One such obvious service is the capability for conducting retrospective searches, through the 
data base, based on a subscriber’s search profile. The methods for calculating the charges for 
retrospective searches understandably vary from service to service. Retrospective searches performed 
in conjunction with the six tape services offered by Chemical Abstracts are based on a fiat tee 
(ranging from $2,100 to $4,400) plus an assessment for the cost of actual computer time used to 
conduct the search. The University of Tulsa calculates charges for in-house retrospective file 
searching on the basis of $10 for each hour of search time in addition to $1,00 for every pertinent 
reference found. Biological Abstracts (BA Previews) charges $150 a search as does IFI/PIenum Data 
Corporation, (The latter also offers reduced rates for contracts of 50 searches.) The American 
Society of Metals (Metals Abstract Index Data Base) charges $250 for a search of this kind. A final 
arrangement we might mention is one devised by the American Geological Institute, which calculates 
its fees on the basis of a rate of $10 per query per 50 items retrieved. 

Since “retrospective” searches have acquired that name for the very good reason that they 
proceed backwards into existing literature, it is important that we determine just how far back a 
researcher may “look” when he relies on these tape services. The basic answer to this question is that 
the time ..pan available depends considerably on which particular service is of interest to the 
subscriber; some services provide coverage of their subject matter going back further than 1960; 
others go back no further than January 1970. As a general indication, we may note that 
approximately half of the tape services do not begin coverage of their subjects untii 1968 or later. 

The other principal in-house service available from producers of scientific-technical tapes is 
SDI - Selective Dissemination of Information. Although a number of SDI services are still in 
development or early implementation stages, as many as five have been operating for one or more 
years. The pricing policies of SDI services are suggested by the following two basic kinds of rate 
structures in effect during 19^ 0: 



i. The SDI service associated with the American Mathematical, Society’s Mathematical 
Off-print Service offered title listings abstracts or off-prints; costs for these services were: 5^ per title 
selected, 2 5^ per abstract selected, and 45-85^ per off-print selected (depending on article length), 

2 S The American Society for Metals offered current awareness services at $250 a year per 
search profile. Biological A bs tracts (BA Previews) offered CLASS (Current Literature Alerting 
Search Service) at $100 a year per search profile. An identical rate ($ 100/year/profile) was adopted 
both by the Keyword Supplement of the Institute of Paper Chemistry and by SEARCH -DATA, 

Conclusion 

Thus we can. conclude that a number of tape services exist, and that they provide coverage of 
different areas of knowledge, at different levels of depth, from different viewpoints, using different 
information control and search techniques - and making different demands on a user’s pocketbook. 
Furthermore, we confirm that the difficulties several university-based information services report 
with attempts to pool and efficiently use s eve 1 tape services are neither imaginary nor understated: 
the range of variety of different characteristic,, is indeed very broad among the tape services we have 
compared. And it is equally easy to understand the feeling of indecision of a prospective user 
attempting to select the one tape service which optimally meets his situation. 

The premise which underlies the utility and validity of a comparative survey such as we have 
presented is the necessity and sufficiency of the parameters in terms of which such comparisons are 
made. We fear, however, that we cannot defend this premise; we do not know whether the parameters 
of comparison are useful for either of the two major clients interested in surveys - those attempting to 
select the best service for their needs, and those seeking to pool several tapes for a wider and more 
efficient service, Nor do we have any evidence that a much larger number of parameters (such as 
prepared by Schwartz 1 ) can be employed to construct a decision-making algorithm for either category 
of potential users, even if one assumed the unlikely situation that such detailed descriptions of data 
bases can be obtained and made public. 

Thus while paying attention to monitoring the characteristics of tape services, perhaps we 
ought to be giving more thought to the idea of surveying the customers, actual and potential. What 
experiences and recommendations do they have? What categories of parameters are of importance to 
them? What is the level of minimum compatibility, or desirable compatibility? Are there guidelines 
for the design of tapes and services which a users’ association might wish to impress or impose upon 
the producers? In proliferating diversity of technical design are we indeed concerned with the 
management of information as a national resource? 

These are among the many thoughts ^occurring in the margins of a simple survey of 
information tape services, j 



! J. Schwartz. “A Checklist for the Examination of Data Base Systems.” New York University, 1970 (Mimeo- 
graphed), 6 p. 
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1. Introduction 

The recent spectacular growth of the nuclear power industry and the accompanying increase 
in problems and concerns regarding nuclear safety have given rise to a deluge of information and 
data in all types of documents and formats. The Nuclear Safety Information Center (NSIC) is helping 
resolve the dilemma faced by scientists, engineers, and others in the field by refining and collating the 
information into more readily digested forms of output. 2 ) 

The purpose of this paper is to describe for the COSAT J Forum of Federally Supported 
Information Analysis Centers, Washington, D. C., May 17-19, 1971, the computerized techniques 
which NSIC is using in its mission. Equally outstanding but non- computerized functions of NSIC 
such as state-of-the-art reports, the journal, Nuclear Safety , and technical consultation are outside the 
scope of this paper and whl not be covered here. Prior to discussing the individual outputs, the 
Center’s information system and computer hardware and programs will be discussed. Following this, 
there will be a prognosis on what is foreseen in IAC computerized activities, 

2, Information System 

NSIC was established in 1963 by the USAEC Division of Reactor Development to collect, 
analyze, and disseminate nuclear-safety-oriented information throughout the nuclear communi- 
ty. (b' 2 ) The Center’s subject scope is divided into the 21 categories listed in Table 1. 

Table 1, Information Categories 

1. General Safety Criteria 

2. Siting of Nuclear Facilities 

3. Transportation and Handling of Radioactive Materials 

4. Aerospace Safety 

5. Heat Transfer and Thermal Transients 

6. Reactor Transients, Kinetics, and Stability 

7. Fission Product Release, Transport, and Removal 

S. Sources of Energy Release Under Accident Conditions 

9. Nuclear Instrumentation, Control, and Safety Systems 

10. Electrical Power Systems 

1 1 . Containment of Nuclear Facilities 

12. Plant Safety Features 

13. Radiochemical Plant Safety 

14. Radionuclide Release and Movement in the Environment 

15. Environmental Surveys, Monitoring, and Radiation Exposure of Man 

1 6. Meteorological Considerations 

17. Operational Safety and Experience 

18. Safety Analysis and Design Reports 

19. Radiation Dose to Man from Radioactivity Release to the Environment 

20. Effects of Thermal Modifications of Ecological Systems 

21. Effects of Radionuclides and Ionizing Radiation on Ecological Systems 



